Site Reliability Engineer III

JPMC Candidate Experience page

Jersey City, NJ

As a Site Reliability Engineering at JPMorgan Chase within the Enterprise technology, liquidity risk team, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team’s strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.

Job responsibilities

• Lead SRE adoption across teams, balancing feature delivery with efficiency and system stability

• Partner with peers and senior stakeholders to align on reliability goals and make trade-offs that improve outcomes

• Set and track reliability and stability metrics, and use data to drive measurable improvements

• Build a continuous-improvement culture by collecting real-time feedback and turning it into customer-impacting changes

• Coordinate with other teams to share solutions and prevent duplicated work

• Run blameless, data-driven post-mortems and regular debriefs to turn incidents (and wins) into learning

• Coach and develop entry- to mid-level engineers through hands-on guidance and feedback

Required qualifications, capabilities, and skills

• Formal training or certification on software engineering concepts and 5+ years applied experience

• Advanced SRE knowledge and a proven track record implementing SRE practices across application and platform teams (including avoiding common pitfalls)

• Experience leading technologists to resolve complex, firmwide technology issues

• Ability to influence team culture by championing innovation and change

• Experience hiring, developing, and recognizing talent

• Proficiency in at least one programming language, with preference for JavaScript, Go, or Python

• Hands-on experience with CI/CD tools (e.g., Jenkins, GitLab, Terraform)

• Experience with containers and orchestration (e.g., Docker, Kubernetes, ECS)

• Troubleshooting experience with common networking technologies and issues

• Strong fundamentals across modern architectures and observability, including GraphQL (schema design, federation/supergraph), event-driven systems (Kafka concepts like partitions/consumer groups, DLQs, replay), microservices patterns (API gateways/routers, CQRS/event sourcing), and end-to-end telemetry using OpenTelemetry (metrics/logs/traces)

Site Reliability Engineer III

Job Information

Related jobs

Trending Job Titles

Trending Locations

Trending Companies

Trending Categories