Site Reliability Engineer III

JPMC Candidate Experience page
Jersey City, NJ

As a Site Reliability Engineering at JPMorgan Chase within the Enterprise technology, liquidity risk team, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team’s strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.

Job responsibilities

•             Lead SRE adoption across teams, balancing feature delivery with efficiency and system stability 

•             Partner with peers and senior stakeholders to align on reliability goals and make trade-offs that improve outcomes 

•             Set and track reliability and stability metrics, and use data to drive measurable improvements 

•             Build a continuous-improvement culture by collecting real-time feedback and turning it into customer-impacting changes 

•             Coordinate with other teams to share solutions and prevent duplicated work 

•             Run blameless, data-driven post-mortems and regular debriefs to turn incidents (and wins) into learning 

•             Coach and develop entry- to mid-level engineers through hands-on guidance and feedback 

 

Required qualifications, capabilities, and skills

•             Formal training or certification on software engineering concepts and 5+ years applied experience  

•             Advanced SRE knowledge and a proven track record implementing SRE practices across application and platform teams (including avoiding common pitfalls) 

•             Experience leading technologists to resolve complex, firmwide technology issues 

•             Ability to influence team culture by championing innovation and change 

•             Experience hiring, developing, and recognizing talent 

•             Proficiency in at least one programming language, with preference for JavaScript, Go, or Python 

•             Hands-on experience with CI/CD tools (e.g., Jenkins, GitLab, Terraform) 

•             Experience with containers and orchestration (e.g., Docker, Kubernetes, ECS) 

•             Troubleshooting experience with common networking technologies and issues 

•             Strong fundamentals across modern architectures and observability, including GraphQL (schema design, federation/supergraph), event-driven systems (Kafka concepts like partitions/consumer groups, DLQs, replay), microservices patterns (API gateways/routers, CQRS/event sourcing), and end-to-end telemetry using OpenTelemetry (metrics/logs/traces) 

 

Preferred qualifications, capabilities, and skills

•             Strong hands-on ability to code and troubleshoot, with solid data fluency

 

// // //