Title: Senior Site Reliability Engineer
Location: Reston, VA
Job type: Full time
Senior Site Reliability Engineer to help lead a team responsible for building, managing, maintaining, and scaling the Linux infrastructure on which our mission-critical services depend. We value engineering professionals who are equipped with great technical skills and attitudes that motivate those around them. We seek individuals with a sense of purpose and for whom detail and craftsmanship are of their essence; individuals that are curious, demonstrate initiative, and love solving problems.
Responsibilities:
• Oversee deployments of solutions that combine Open Source components, COTS (commercial off the shelf) components, and custom developed components
• Deploy, configure, and maintain services in production, QA and development environments, on various platforms
• Possess advanced development of deployment automation including Ansible
• Development of automation and configuration management solutions using tools such as Jenkins
• Securing automated deployments using tools such as Hashicorp Vault
• Securing infrastructure with micro-segmentation tools such as Guardicore
• Documentation of processes, procedures, and configurations
• Coordination with other technical staff to implement systems and software
• Performance of operations support functions, including problem isolation and resolution
• Documentation of processes, procedures, configurations, and deployment plans
• Provide critical technical leadership in the areas of operational process and change management, and mentor less experienced engineers
• Participate in a 24x7 on call rotation
Our ideal candidate will have a unique blend of impeccable technical skills and a desire to work as part of a team to grow themselves and those around them. The team deploys engineers and operates enabling infrastructure in support of mission critical customer applications.
The candidate must have:
• Bachelor's degree in computer science or a related technical field, or equivalent combination of education and experience
• 10+ years of experience developing and operating mission-critical systems
• Excellent understanding of Linux configuration and administration
• Strong experience with a high-level scripting language such as Python
• Strong automation experience - not just developing automation, but knowing why we automate and what to automate
• Strong understanding of infrastructure-as-code
• Strong written and verbal communication skills - able to clearly and succinctly describe complex issues
• Strong understanding of network protocols and security
• Experience developing mature solutions for monitoring and reporting using tools like Grafana and Splunk
• Familiarity with development tools such as Github, Jira, and Confluence
Desired Skills, Experience, and Attributes:
• Deployment automation experience using tools such as Ansible
• Experience using Jenkins in a continuous delivery and integration environment
• Experience with Virtualization platforms in a production environment such as oVirt and VMWare
• Experience with HTTP proxies such as Squid
• Experience with RedHat Enterprise Linux
• Experience with CMDB and ITIL platforms such as ServiceNow
• Experience with RedHat Identity Manager and/or FreeIPA
• Experience with Linux authentication via Active Directory
• Experience administrating Linux and Unix systems in a large-scale environment
• Experience working with teams using Scrum a plus