Sr Staff Site Reliability Engineer

Satine Technologies
Atlanta, GA

About the RoleYou'll own the reliability posture of a large-scale healthcare platform. That means infrastructure design, deployment pipelines, observability, incident response, and the hard conversations about when something isn't production-ready. You'll work alongside software engineers and security engineers who are building real capabilities - your job is to make sure what they build actually runs.This isn't a ticket-queue SRE role. At this level, we expect you to define what good looks like and pull the team toward it.What You'll DoDesign and own the infrastructure architecture for a cloud environment: multi-region, high-availability, built for real operational loadSet reliability standards: SLOs, error budgets, incident response playbooks, runbooksLead the observability practice - define what gets measured, how, and what gets done about itOwn CI/CD pipeline architecture and deployment strategy across environmentsBe the senior technical voice in design reviews when reliability, scalability, or operational risk is on the tableMentor Staff-level engineers - raise the floor on how the team builds and operates systemsParticipate in on-call rotation and lead incident response for platform issuesPartner with security engineers to ensure infrastructure meets security and compliance requirements without making the platform slow to shipWhat We're Looking ForRequired:10+ years of SRE, platform engineering, or infrastructure engineering experienceExpert-level Kubernetes - you've designed and operated production clusters, not just deployed to themDeep Terraform and infrastructure-as-code experience at scaleStrong CI/CD pipeline design and implementation experienceExperience operating production systems in a major cloud platform (AWS, Azure, or GCP)US citizenship or Lawful Permanent Resident status (Public Trust eligibility required)Paths In - You Might Be a Fit If You:Have been the most senior SRE on a team and found yourself setting architecture direction, not just executing on itCome from a hyperscaler, high-growth startup, or product company and want to apply that scale experience to systems where the stakes are higher than uptime SLAsHave been carrying a team's platform reliability on your back informally and want a title and scope that match what you're actually doingAre a strong infrastructure engineer who wants to work on something more meaningful than the next product sprintHelpful but Not Required:Experience with Kafka, Prisma, or event-driven microservices architecturesFamiliarity with security or compliance frameworks (FedRAMP, NIST 800-53, SOC 2, or similar)Experience mentoring or technically leading a distributed engineering teamPrometheus, Grafana, ELK or similar observability stack experienceAbout Satine TechnologiesOur mission is to protect the institutions that underpin free society from cyber threats. We're a small, mission-driven team that works on problems that matter - from offensive security testing for hospitals and banks to building capabilities for national security missions.We invest in people who invest in themselves. This isn't a body shop. You'll work with a team that takes pride in technical craft and cares about developing the people who join us.BenefitsHealth insurance with vision, dental, and HSALife insurance (100% employer-funded)401(k) with 4% matchFlexible PTOTo all recruitment agencies: Satine Technologies does not accept agency resumes.
recblid 6he87gtcjmntputc6tsde8etcbdtrx

Not Specified
// // //