on:We are seeking an experienced MLOps Engineer with strong expertise in Python and big data technologies to join our team. This role focuses on operational excellence, including optimizing feature engineering pipelines and maintaining machine learning models in production environments. Desired candidate will work closely with platform and data science teams to ensure scalable, reliable, and high-performance ML workflows using existing framewor
ks.This position will be performed onsite five days a week from any our client sites in Dallas, t/Strongsville, OH/Pittsburg,
PA Future Duties and Responsibilit
ies:Optimize and maintain large-scale feature engineering pipelines using PySpark, Pandas, and PyArrow on Hadoop-based infrastruct
ure.Refactor and modularize ML codebases to enhance reusability, maintainability, and performa
nce.Collaborate with platform teams on compute capacity planning, resource allocation, and system upgra
des.Integrate with existing model serving frameworks to support testing, deployment, and rollback proces
ses.Monitor and troubleshoot production ML pipelines, ensuring high reliability, low latency, and cost efficie
ncy.Contribute to internal ML platforms by sharing insights, proposing improvements, and documenting best practi
ces.Build near real-time ML pipelines using Kafka and Spark Stream
ing.Work with AWS and SageMaker MLOps ecosys
tem. Required Qualifications to be Successful in this
role:6+ years of experience in software engineering, data engineering, or MLOps r
oles.Strong programming expertise in Python, with hands-on experience in Pandas, PySpark, and PyA
rrow.Deep understanding of the Hadoop ecosystem, distributed computing, and performance tu
ning.Experience with CI/CD pipelines and best practices in ML environm
ents.Hands-on experience with monitoring tools for ML pipeline health and perform
ance.Strong collaboration skills with experience working in cross-functional teams (platform, data science, engineer
ing).Experience contributing to or building internal MLOps frameworks/platf
orms.Familiarity with SLURM clusters or other distributed job schedu
lers.Exposure to Kafka, Spark Streaming, or other real-time data processing technolo
gies.Understanding of ML lifecycle management, including versioning, deployment, and drift detec