Senior DevOps Engineer

Lumicity
Santa Rosa, CA

Senior DevOps / Site Reliability Engineer

Location: San Francisco Bay Area (Hybrid) Level: Senior Type: Full-Time


The company

Series B healthcare AI company that has grown revenue by a tremendous amount. More than 100 enterprise healthcare organizations use our platform to automate complex, compliance-critical operational workflows — the kind of work that used to require large manual teams and still carries serious downstream risk if it breaks.


We're about 100 people, well-funded, and at an inflection point: our platform is scaling fast, our engineering team is growing, and reliability is becoming mission-critical. This isn't a company that's been around long enough to accumulate decades of technical debt. You'd be building the right foundation from the start.


The role

We're hiring a Senior DevOps Engineer or Site Reliability Engineer — depending on where your experience and interests land.


Both roles sit within our engineering team, report into engineering leadership, and work closely with backend and ML engineers. The difference is in focus:

  • DevOps track: Infrastructure as code, CI/CD, deployment systems, developer experience, and platform reliability.
  • SRE track: Observability, incident management, SLO frameworks, and production reliability across distributed systems.


Whichever track you're on, this is a hands-on, high-ownership role. You'll have real production responsibility and real impact on how the platform performs at scale.


What you'll work on

  • Design and evolve AWS-based cloud infrastructure using Terraform
  • Own and improve CI/CD pipelines (GitHub Actions) for fast, safe deployments
  • Standardize deployment patterns across serverless workloads (Lambda), containerized services (ECS), and workflow orchestration systems
  • Define observability standards across metrics, logs, and traces using OpenTelemetry, Datadog, Grafana, and Sentry
  • Build proactive detection for reliability risks, latency regressions, and performance degradation
  • Partner with backend and ML teams to debug distributed system issues, including Postgres performance
  • Lead and support incident response and root cause analysis
  • Automate security and compliance workflows (access controls, audit readiness, vulnerability management)
  • Participate in on-call rotation


What we're looking for

Must have:

  • 7+ years in DevOps, SRE, or infrastructure engineering in a B2B SaaS environment
  • Strong production AWS experience
  • Deep hands-on Terraform (IaC) experience
  • CI/CD pipeline ownership (GitHub Actions or equivalent)
  • Experience with serverless and containerized services in production
  • Postgres in production (performance, tuning, operations)
  • Observability tooling: metrics, logs, traces — and the ability to turn signals into action
  • Scripting fluency (Python, Bash, or similar)
  • High ownership mindset — you're not waiting to be assigned an incident, you're already thinking about failure modes

Nice to have:

  • Experience in healthcare, fintech, or other regulated environments
  • ClickHouse or high-scale analytics systems
  • OpenTelemetry and modern observability architecture
  • ML infrastructure experience


Why join now

  • Define reliability and infrastructure standards before they calcify
  • Tight collaboration with product, backend, and ML — no siloed infra team
  • Meaningful equity in a company with strong investor backing and real market traction
  • Modern cloud-native stack: AWS, Terraform, GitHub Actions, ECS, Lambda, Aurora Postgres, Datadog, OpenTelemetry


Interested or know someone who might be? Apply below or reach out directly.

// // //