Site Reliability Engineer

Charles Schwab Inc.
Westlake, TX

Your Opportunity

At Schwab, you’re empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together.

We believe in the importance of in-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s).

As aSite Reliability Engineer (SRE)in Client Data Technology (CDT), you will lead System Availability Engineering (SAvE) Teams for Client Data Technology (CDT) applications, playing a critical role in ensuring availability of CDT eco systems, and guiding the development, automation, tooling and realization of SRE best practices. You will drive our adoption of cloud technologies, ensuring that those systems adhere to the SRE standards that we have built with our on premise infrastructure. Through this role you will exercise your influence to ensure new cloud applications are effectively developed, deployed, and maintained.You will be responsible for:

  • Identifying tactical and strategic opportunities to improve service health, performance, reliability, and telemetry across CDT Platform focused on systems implemented in the cloud
  • Leading the design, architecture and implementation of availability and resiliency roadmap that delivers on modernized tooling and metrics to enable adhering to MTTD, MTTR, and general Availability goals
  • Working closely with development team to define a sustainable operating model for CDT cloud applications focusing on platform scale, availability, fault tolerance, and performance to ensure repeatability, consistency, and portability.
  • Driving a shift-left mindset and influence architectural decisions to ensure resiliency and scale at the outset of software development process.

What you have

Required qualifications:

  • Bachelor's degree in Computer Engineering, Computer Science, or related field
  • 8+ years of software development and site reliability engineering experience supporting production applications in Google Cloud Platform (GCP)
  • 6+ years in DevOps engineering leadership focusing on complementing production operations with automation and tooling initiatives
  • 4+ years of technical leadership, supporting highly technical individuals , development and driving efficiencies
  • 3+ years of experience defining, driving, and implementing operational best practices (SLOs, SLIs, Error Budgets, Monitoring errors, capacity planning, blameless postmortems, and toil management)
  • Proficient in programming languages to automate repeatable processes and building IaaC solutions (Python, CloudFormation, Terraform)
  • Knowledge of databases - (SQL, Aerospike, Postgres preferred)
  • Knowledge of RabbitMQ and Kafka

Preferred qualifications:

  • 6+ years of technical leadership, supporting highly technical individuals , development and driving efficiencies
  • Drives though leadership with development teams to encourage building cloud systems that are maintainable on day one
// // //