Director of Site Reliability Engineering (SRE)
We’re partnering with an exciting, fast-growing observability startup that’s building next-generation tooling to help engineering teams better understand, monitor, and optimise complex systems. With strong product-market fit, rapid customer adoption, and significant backing, they’re entering a critical phase of growth, and are looking for a Director of SRE to lead their reliability function.
This is a rare opportunity to join a company that lives and breathes reliability, and to shape how modern systems are monitored and maintained at scale.
The Role
As Director of SRE, you’ll own the reliability, scalability, and performance of a platform used by engineering teams worldwide. You’ll play a key role in building resilient infrastructure while also shaping the internal culture around observability and operational excellence.
You will:
- Build and lead a world-class SRE team in a high-growth environment
- Define and implement SLIs, SLOs, and SLAs across the platform
- Partner closely with product and engineering teams to embed observability best practices
- Drive improvements in system performance, availability, and scalability
- Lead incident response, post-incident reviews, and reliability initiatives
- Champion automation and tooling to enhance system visibility and operational efficiency
What We’re Looking For
- Proven experience leading SRE or platform teams in high-scale environments
- Strong background in cloud infrastructure (AWS, GCP, or Azure)
- Deep understanding of observability tooling (metrics, logs, tracing)
- Experience with distributed systems and microservices architectures
- Track record of improving uptime, reliability, and system performance
- Strong leadership, communication, and cross-functional collaboration skills
Nice to Have
- Experience working in observability, DevTools, or infrastructure-focused companies
- Hands-on experience with Kubernetes and containerised environments
- Background in software engineering
- Experience scaling teams during periods of rapid growth
Why Join?
- Join a category-defining observability company at a key growth stage
- High-impact leadership role with ownership over reliability strategy
- Work alongside a highly technical, product-focused team
- Competitive compensation, equity, and benefits
- Fast-paced, collaborative, and engineering-driven culture
Location
- Downtown San Francisco (hybrid working options available