Senior Site Reliability Engineer

Smart IT Frame LLC
Reston, VA

Title: Senior Site Reliability Engineer

Location: Reston, VA

Job type: Full time


Senior Site Reliability Engineer to help lead a team responsible for building, managing, maintaining, and scaling the Linux infrastructure on which our mission-critical services depend. We value engineering professionals who are equipped with great technical skills and attitudes that motivate those around them. We seek individuals with a sense of purpose and for whom detail and craftsmanship are of their essence; individuals that are curious, demonstrate initiative, and love solving problems.


Responsibilities:

• Oversee deployments of solutions that combine Open Source components, COTS (commercial off the shelf) components, and custom developed components

• Deploy, configure, and maintain services in production, QA and development environments, on various platforms

• Possess advanced development of deployment automation including Ansible

• Development of automation and configuration management solutions using tools such as Jenkins

• Securing automated deployments using tools such as Hashicorp Vault

• Securing infrastructure with micro-segmentation tools such as Guardicore

• Documentation of processes, procedures, and configurations

• Coordination with other technical staff to implement systems and software

• Performance of operations support functions, including problem isolation and resolution

• Documentation of processes, procedures, configurations, and deployment plans

• Provide critical technical leadership in the areas of operational process and change management, and mentor less experienced engineers

• Participate in a 24x7 on call rotation


Our ideal candidate will have a unique blend of impeccable technical skills and a desire to work as part of a team to grow themselves and those around them. The team deploys engineers and operates enabling infrastructure in support of mission critical customer applications.


The candidate must have:

• Bachelor's degree in computer science or a related technical field, or equivalent combination of education and experience

• 10+ years of experience developing and operating mission-critical systems

• Excellent understanding of Linux configuration and administration

• Strong experience with a high-level scripting language such as Python

• Strong automation experience - not just developing automation, but knowing why we automate and what to automate

• Strong understanding of infrastructure-as-code

• Strong written and verbal communication skills - able to clearly and succinctly describe complex issues

• Strong understanding of network protocols and security

• Experience developing mature solutions for monitoring and reporting using tools like Grafana and Splunk

• Familiarity with development tools such as Github, Jira, and Confluence


Desired Skills, Experience, and Attributes:

• Deployment automation experience using tools such as Ansible

• Experience using Jenkins in a continuous delivery and integration environment

• Experience with Virtualization platforms in a production environment such as oVirt and VMWare

• Experience with HTTP proxies such as Squid

• Experience with RedHat Enterprise Linux

• Experience with CMDB and ITIL platforms such as ServiceNow

• Experience with RedHat Identity Manager and/or FreeIPA

• Experience with Linux authentication via Active Directory

• Experience administrating Linux and Unix systems in a large-scale environment

• Experience working with teams using Scrum a plus

// // //