Senior AI Data Engineer

Intelliswift - An LTTS Company
Menlo Park, CA

Job Title: Senior AI Data Engineer

Location: Menlo Park, CA (Hybrid)

Duration: 6 months (potential extensions to long term)


Our client is looking for a Senior AI Data Engineer to build and scale next-generation data pipelines powering image generation systems. This role sits at the intersection of data engineering and ML systems, where pipelines not only process data but also invoke and orchestrate machine learning models at scale.


You’ll work on large-scale datasets (billions of records/images), enabling high-quality training data across dimensions like visual quality, prompt adherence, and content understanding.


Must-Have Skills

  • Advanced SQL & data pipeline expertise. Complex queries, query optimization, pipeline orchestration frameworks (Airflow or equivalent).
  • Experience integrating ML models into data pipelines. Calling inference endpoints, managing model versions, batching requests, handling inference failures at scale.
  • Demonstrated track record of building and operating production data pipelines that invoke ML models at scale.
  • Proficiency with AI-assisted coding agents (e.g., Copilot, Cursor, Codex). Expected to leverage AI tools as a force multiplier for writing, debugging, and reviewing code, building pipelines faster, and accelerating day-to-day engineering workflows
  • Strong verbal and written communication skills, problem-solving ability, and cross-functional collaboration


Nice-to-have Skills:

  • Working knowledge of embeddings and vector representations like generating, storing, indexing, and querying embeddings.
  • Familiarity with content-understanding models like image classifiers, object detection, OCR, NSFW detection, aesthetic scoring.
  • Experience with LLMs for data tasks like prompt engineering for annotation, data cleaning, or evaluation using LLM APIs.
  • Knowledge of generative AI like diffusion models, image generation, evaluation metrics (FID, CLIP score, etc.).


Education / Experience

  • Bachelor's degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field.
  • 5+ years of industry experience in data engineering, ML engineering, or a hybrid role involving both data pipelines and model serving/inference.
  • Demonstrated track record of building and operating production data pipelines that invoke ML models at scale.


Key Responsibilities

AI-Augmented Data Pipelines:

  • Design and maintain large-scale pipelines that combine data transformations with ML model inference
  • Integrate classifiers, embeddings, and LLM-based processing into data workflows


Inference Orchestration:

  • Manage remote model execution within pipelines, including batching, retries, and async processing
  • Optimize performance, scalability, and reliability of inference systems


Embedding & Feature Engineering:

  • Build and maintain pipelines for generating and managing vector embeddings
  • Support similarity search and indexing use cases


Data Curation at Scale:

  • Source, clean, and curate datasets using a combination of SQL logic and model-derived signals
  • Ensure data quality, governance, and consistency


LLM-Based Workflows:

  • Develop pipelines using LLMs for annotation, evaluation, and data enrichment
  • Implement quality checks and audit mechanisms for model-driven outputs


Tooling & Frameworks:

  • Contribute to reusable tools and frameworks that simplify AI-powered data pipeline development


// // //