Sr. Manager, SRE/ITOps

Panera Bread
Newton, MA

Panera, LLC is seeking a talented Sr. Manager, SRE/ITOps to lead our Site Reliability Engineering / IT Operations function. This role is responsible for building and mentoring a team of engineers (direct reports), operating and continuously improving production platforms and services, and partnering with Engineering, Security, and Product to deliver reliable, scalable, and cost-effective systems.

In this role, you will be a people leader who sets direction and drives execution across reliability engineering and day-to-day operations. You will establish SLOs/SLIs, lead incident response and continuous improvement, and own the Major Incident Management (MIM) process to ensure clear command, communications, and rapid service restoration for high-severity events. You will ensure internal and external services meet reliability, performance, and security expectations while upholding strong engineering and operational excellence principles.

Responsibilities include:

  • Lead, coach, and develop a team of SRE/ITOps engineers (direct reports), including hiring, onboarding, performance management, career development, and succession planning.
  • Own operational readiness for production services: capacity planning, change/release readiness, resiliency reviews, and launch approvals in partnership with Engineering and Product.
  • Define and manage service level objectives (SLOs/SLIs) and error budgets; monitor availability, latency, and overall system health; and drive improvements based on data and customer impact.
  • Drive automation to reduce toil and improve scalability, resiliency, and efficiency (infrastructure as code, configuration management, CI/CD enablement, and self-service operational tooling).
  • Lead incident management, on-call operations, and escalation processes, including ownership of the Major Incident Management (MIM) program: declare/triage severity, run major incident bridges/war rooms, drive cross-team coordination, provide timely stakeholder communications, and facilitate blameless postmortems to ensure corrective actions are prioritized, tracked, and completed.
  • Establish and continuously improve Major Incident Management standards and readiness (playbooks, roles/RACI, tooling, training and drills). Track MIM KPIs (MTTA/MTTR, incident frequency/severity), and partner with Engineering and Service Management on problem management and recurring-incident elimination.
  • Manage operational backlogs and service improvement plans; partner with Security/Compliance to meet audit and control requirements; and manage vendors/tools as needed.

Requirements:

  • 7+ years of experience in SRE, production operations, DevOps, or infrastructure engineering, with demonstrated ownership of highly available services.
  • 2+ years of people management experience (or team lead experience with direct coaching responsibility), including hiring and developing engineers.
  • Experience operating cloud and/or hybrid environments (IaaS/PaaS, microservices), including observability, incident response, capacity planning, and reliability engineering practices.
  • Hands-on technical depth across systems, networking, security, and databases; ability to dive deep when needed and guide design/operational decisions.
  • Proficiency with automation, orchestration, and infrastructure as code (e.g., Terraform/CloudFormation, Ansible/Chef/Puppet/Salt, containers/Kubernetes).
  • Experience with CI/CD practices and operational governance (change management, release management, environment hygiene), balancing delivery speed with reliability.
  • Strong analytical, troubleshooting, and communication skills, with the ability to align stakeholders during incidents and drive cross-team execution.

Preferred Skills:

  • Experience designing, analyzing, and operating large-scale distributed systems, including disaster recovery and business continuity planning (RTO/RPO).
  • Strong observability background (monitoring, logging, tracing) and APM tooling such as Dynatrace, New Relic, AppDynamics, Datadog, Splunk, or similar.
  • Demonstrated ability to influence without authority, lead through ambiguity, and partner effectively with Engineering, Security, and business stakeholders.
  • Experience establishing operational processes (incident, problem, change) and service management practices (ITIL familiarity a plus).
  • Budgeting, vendor management, and tool lifecycle management experience (selection, procurement partnership, renewals, and value realization).
  • Experience building or operating systems in a secure, regulated, or compliant environment (e.g., SOX, PCI, SOC2), including audit support and control remediation.
  • A passion for automation and operational excellence, and experience partnering with engineering teams in a DevOps/SRE culture.

Additional Description :

Competitive Pay: $155,477 - $186,572 annually

// // //