Design, deploy, and operate end‑to‑end production ML pipelines across Dev, QA, and Prod environments.
Set up and manage AWS SageMaker pipelines, endpoints, and monitoring for large scale inference workloads, including embedding generation, named entity recognition, reranking, and video processing.
Own GPU & CPU infrastructure selection, scaling, & optimization, including instance benchmarking, autoscaling behavior, & load testing.
Deploy, monitor, & operate inference services that support hundreds of thousands of queries per day across text, image, & video pipelines.
Establish standardized ML deployment patterns at AP, including:
Containerization and orchestration strategies
Environment isolation (Dev / QA / Prod)
Versioned promotion, rollback, and recovery mechanisms
Implement monitoring, alerting, drift detection, and evaluation metrics for production ML systems, tracking latency, error rates, throughput, and model/data drift.
Enable A/B testing & controlled rollout strategies for ML models in partnership with engineering & product teams.
Partner closely with ML Engineers, Data Scientists, DevOps, and Platform teams to:
Operationalize new models and pipeline improvements
Promote systems across environments safely
Ensure deployments meet reliability, scale, and cost targets
Manage high-throughput I/O and data movement for large collections of media assets (text, images, video), avoiding CPU, network, and storage bottlenecks.
Reduce operational risk by enforcing reproducibility, observability, security, & cost control across production ML systems.
This role owns:
Deployment, scaling, and runtime operation of ML systems
ML infrastructure configuration and orchestration
Monitoring, alerting, A/B testing infrastructure, and drift detection
Reliability, cost control, and production governance
This role does NOT own:
Designing model architecture
Feature engineering or data science outputs
Model accuracy or inference logic (These are owned by ML Engineers and Data Science)
Required Skills & Experience
Hands‑on experience deploying and operating ML inference systems in production.
Experience with AWS SageMaker, including pipelines, endpoints, monitoring, and multi‑environment deployments.
Expertise deploying ML models using PyTorch and TensorFlow from an operational and serving perspective.
Proven experience with model deployment and orchestration, including containerized inference and autoscaling.
Experience selecting, evaluating, and optimizing compute resources (GPU/CPU) for production ML workloads.
Experience setting up monitoring, evaluation metrics, and A/B testing frameworks for ML systems in production.
Ability to collaborate effectively with ML Engineers, Data Scientists, and platform teams in a shared ownership model.
Strongly Preferred
Experience running ML workloads over large‑scale text, image, and video datasets.
Operational experience supporting ML systems involving Transformer‑based NLP models (e.g., BERT‑family models), Computer vision models, Ranking & reranking systems
Familiarity operating systems that use common ML model types such as Convolutional & feed‑forward neural networks, Ranking algorithms, Approximate Nearest Neighbor methods (HNSW)