Senior MLOps Engineer - Analytics & AI

Athenahealth
Boston, MA

Grow your career internally or refer a friend to athenahealth!

Role summary:
Help power the infrastructure behind athenahealth’s AI platform. The Senior MLOps Engineer is a Senior Associate-level role based inBoston, MAin ahybridwork model, responsible for engineering, maintaining, and advancing centralized AI platforms that support model training, serving, monitoring, and operational reliability. This role partners across engineering, data science, and platform teams to improve platform performance, security, and availability while enabling scalable AI development and deployment. This role reports to theSenior Manager.

Team summary:
Core AI is at the center of athenahealth’s company-wide initiative to unlock the value of healthcare information through data science, machine learning, and generative AI. Working with large-scale healthcare data, the team develops platforms and capabilities that support innovative AI use cases across the business and help improve the healthcare experience for providers and patients.

The MLOps team designs, develops, deploys, monitors, and supports the cloud-based platform that powers frontier, foundational, and customized models at athenahealth. This team focuses on operational reliability, platform performance, secure delivery, and scalable engineering practices. The Senior MLOps Engineer will help build and improve the tools, infrastructure, and workflows that make it easier for teams to train, deploy, observe, and manage AI services in production. This role works closely with AI engineers, data scientists, and infrastructure partners to ensure the platform is resilient, efficient, and ready to support evolving business and technical needs.

Essential Job Responsibilities:

  • Engineerand maintain centralized AI and MLOps platforms that support model training, deployment, and serving at scale.
  • Ensurethe availability, performance, and reliability of cloud-based ML training and inference platforms through monitoring, alerting, and proactive issue identification.
  • Buildautomation and platform tools that improve deployment speed, service stability, and engineering confidence.
  • Deployand maintain containerized services and ML workloads in Kubernetes-based environments.
  • Integratesecurity practices into software and infrastructure delivery workflows to support secure, reliable platform operations.
  • Collaborateacross engineering, infrastructure, and AI teams to support high availability, disaster recovery, and strong customer outcomes.
  • Evaluateand integrate emerging AI tools and technologies from providers such as OpenAI, Anthropic, Google, Microsoft, and AWS where they align with platform needs.
  • Developmicroservices and platform components in public cloud environments such as AWS, Azure, or GCP.
  • UseAI tools and platform capabilities in day-to-day engineering work to improve troubleshooting, automation, deployment workflows, and operational efficiency, while continuing to learn and apply new tools as they become relevant to the role.

Additional Job Responsibilities:

  • Supportincident response, root cause analysis, and follow-up remediation efforts.
  • Documentplatform architecture, operational procedures, and engineering standards.
  • Contributeto platform roadmap discussions and technical planning activities.
  • Partnerwith data scientists and AI engineers to improve model training and deployment workflows.
  • Assistwith evaluation of new frameworks, tooling, and observability capabilities.
  • Improvesystem visibility through dashboards, metrics, and log aggregation practices.
  • Participatein design reviews and cross-team technical discussions.
  • Contributeto continuous improvement of MLOps, DevOps, and SRE practices.

Expected Education & Experience:

  • Bachelor’s degreein Computer Science or an equivalent field, or equivalent professional experience.
  • 4 to 6 years of experiencein Software Engineering, Data Engineering, MLOps, DevOps, SRE, or a related technical area.
  • Strong experiencewith Kubernetes, including designing, deploying, and maintaining enterprise-class ML models and services.
  • Proficiencyin Python and experience developing microservices in public cloud environments such as AWS, Azure, or GCP.
  • Experiencewith ML platform and infrastructure technologies such as Terraform, Spark, service mesh architectures including Istio, and cloud security practices.
  • Experiencedeploying and maintaining Linux-based, scalable, fault-tolerant software platforms.
  • Experiencewith Azure AI Foundry or Amazon Bedrock, and familiarity with services such as LiteLLM, LangSmith, Arize, or Braintrust.
  • Experiencewith monitoring and observability tools such as Grafana, Prometheus, and CloudWatch.
  • Experiencewith databases and data platforms such as Snowflake, Postgres, MySQL, Redis, and DynamoDB.
  • Familiaritywith CI/CD, configuration management, and orchestration tools such as Jenkins, Puppet, Bottlerocket, or Chef.
  • Experienceworking with Data Scientists and AI Engineers, including support for model training pipelines such as Kubeflow.

Expected Compensation

$145,000 - $247,000

The base salary range shown reflects the full range for this role from minimum to maximum. At athenahealth, base pay depends on multiple factors, including job-related experience, relevantknowledge and skills, how your qualifications compare to others in similar roles,and geographical market rates. Base pay is only one part of our competitive Total Rewards package - depending on role eligibility, we offer both short and long-term incentives by way of an annual discretionary bonus plan, variable compensation plan, and equity plans.

Have you notified your current manager of your application?

// // //