Senior AI Data Engineer

Intelliswift - An LTTS Company

Menlo Park, CA

Job Title: Senior AI Data Engineer

Location: Menlo Park, CA (Hybrid)

Duration: 6 months (potential extensions to long term)

Our client is looking for a Senior AI Data Engineer to build and scale next-generation data pipelines powering image generation systems. This role sits at the intersection of data engineering and ML systems, where pipelines not only process data but also invoke and orchestrate machine learning models at scale.

You’ll work on large-scale datasets (billions of records/images), enabling high-quality training data across dimensions like visual quality, prompt adherence, and content understanding.

Must-Have Skills

Advanced SQL & data pipeline expertise. Complex queries, query optimization, pipeline orchestration frameworks (Airflow or equivalent).
Experience integrating ML models into data pipelines. Calling inference endpoints, managing model versions, batching requests, handling inference failures at scale.
Demonstrated track record of building and operating production data pipelines that invoke ML models at scale.
Proficiency with AI-assisted coding agents (e.g., Copilot, Cursor, Codex). Expected to leverage AI tools as a force multiplier for writing, debugging, and reviewing code, building pipelines faster, and accelerating day-to-day engineering workflows
Strong verbal and written communication skills, problem-solving ability, and cross-functional collaboration

Nice-to-have Skills:

Working knowledge of embeddings and vector representations like generating, storing, indexing, and querying embeddings.
Familiarity with content-understanding models like image classifiers, object detection, OCR, NSFW detection, aesthetic scoring.
Experience with LLMs for data tasks like prompt engineering for annotation, data cleaning, or evaluation using LLM APIs.
Knowledge of generative AI like diffusion models, image generation, evaluation metrics (FID, CLIP score, etc.).

Education / Experience

Bachelor's degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field.
5+ years of industry experience in data engineering, ML engineering, or a hybrid role involving both data pipelines and model serving/inference.
Demonstrated track record of building and operating production data pipelines that invoke ML models at scale.

Key Responsibilities

AI-Augmented Data Pipelines:

Design and maintain large-scale pipelines that combine data transformations with ML model inference
Integrate classifiers, embeddings, and LLM-based processing into data workflows

Inference Orchestration:

Manage remote model execution within pipelines, including batching, retries, and async processing
Optimize performance, scalability, and reliability of inference systems

Embedding & Feature Engineering: