Site Reliability Engineer

Verisk | Verisk

Hyderabad, IN

We’re a small engineering team building and operating production services that must stay up and available across multiple regions, even when things go wrong. We’re looking for a pragmatic Site Reliability Engineer who can design, build, and operate resilient systems without unnecessary complexity.

This role is hands-on and collaborative: you’ll work closely with application engineers to make reliability a shared responsibility, not a gate.

Multi-Region Reliability & Availability (Primary Focus)

Design and operate multi-region architectures (active/active or active/passive)
Implement and improve automated failover and traffic routing
Identify and eliminate single points of failure
Ensure regional isolation and graceful degradation when dependencies fail

High Availability & Disaster Recovery

Define realistic availability goals and failure scenarios
Design and test backup and restore processes
Own disaster recovery plans and validate them through regular testing
Help the team understand RTO/RPO trade-offs

Observability & Incident Response

Build and maintain clear, actionable observability (metrics, logs, traces)
Create alerts that detect real problems without noise
Participate in on-call and help improve incident response
Lead or contribute to blameless postmortems and follow-up fixes

Automation & Operations

Reduce manual operational work through automation
Improve deployment safety (rollbacks, health checks, canaries where appropriate)
Manage infrastructure using infrastructure as code
Design systems that recover automatically whenever possible

Site Reliability Engineer

Job Information

Job Category:

Related jobs

Trending Job Titles

Trending Locations

Trending Companies

Trending Categories