Site Reliability Engineer (AWS & Terraform)

FUSTIS LLC
Atlanta, GA

Sr Software Engineer (Site Reliability Engineering)


Contract- 5+ months CTH possibility

Peachtree Dunwoody Atlanta GA Hybrid- 2 days in the office no relocation

Pay rate: 70/h W2 hourly

USC or GC holders only.

Skills

AWS broadly (EC2, but moving away and going to Fargate) ECS, code pipeline and build, deploy, RDS, Dynamo, WAP, CloudFront)

GitHub actions moving away from AWS code connect

Terraform (must) most of the day there, or CDK is a benefit but terraform the hard requirement

Terraform modules strong plus

Splunk, new relic, pager duty is great lot of alerting, dashboarding, capacity planning, running internal PaaS

Interview: 1st round with him, quick screen, overview on role, team structure, projects, high level tech questions, not coding more high-level systems design on the spectrum

Second round with tech leads: more systems design, potentially a live coding question things. Theu have an eye on people using AI to answer, culture ft as well.

Job Description:

This role is for an opening for a Senior Site Reliability Engineer (SRE) on the Manheim Logistics SRE team. The SRE team is tasked with designing and maintaining AWS infrastructure and deployment pipelines for Manheim Logistics’ 15+ development teams. The team has currently standardized on a Docker-based infrastructure solution and is adding functionality to support new development team requests and architectural patterns (such as Lambda, Step Functions, Fargate, etc). The SRE team has a strong focus on IaC with Terraform and best practices such as least privilege access, proactive monitoring and alerting, etc. This role will work directly with a release train and help with IaC and SRE activites such as improving monitoring/alerting, defining an error budget, assisting with DevSecOps, etc.

As a Senior Site Reliability Engineer you will:

Strong automation experience- testing, deploying, monitoring, etc.

Take complex problems and come up with a technically reasonable solution

Experience working with and defining SLOs, error budgets, etc.

Have innate curiosity about how things work

Design and assist in the authoring of software tools that reliably manage application delivery & performance

Design and assist in the setup and maintenance of application monitoring and alerting

Engage with engineering teams to ensure best practices are implemented

Improve predictability and reliability of software releases, workflows, and operating software.

Reduce mean time to recovery (MTTR) by helping troubleshoot, monitor, alert, and automating recovery.

Qualifications:

Bachelor’s degree in Computer Science or related field and at least 3-5 years working experience

Expertise in software development and architecture/solutioning experience

Strong background with Terraform

Experience with Amazon AWS technologies especially: ECS and Lambda

Experience with monitoring/observability tools such as: New Relic, Splunk, PagerDuty

Experience with agile development, continuous integration and automated testing

Solid written communication, problem solving, and process management skills

Preferred Skills:

Broad AWS platform skills including Cognito, WAF, Elasticache (Redis), Elasticsearch, SNS, SQS, S3, Systems Manager

Experience automating Terraform at scale

Experience with Database Server infrastructure (RDS, MySQL, Postgres, etc)

.NET core development experience

GitHub Actions

Experience with Github, docker, and Linux adminstration experience