Practice Architect II, SRE

TEKsystems
Hanover, MD

Description

Think of TEKsystems Global Services (TGS) as the growth solution for enterprises today. We unleash growth through technology, strategy, design, execution and operations with a customer-first mindset for bold business leaders. We deliver cloud, data and customer experience solutions. Our partnerships with leading cloud, design and business intelligence platforms fuel our expertise. We value deep relationships, dedication to serving others and inclusion. We drive positive outcomes for our people and our business, and we stay true to our commitments and act in harmony with our words. We exist to create significant opportunity for people to achieve fulfillment through career success. Ready to join us?

Here’s what the opportunity supported through our TGS Talent Acquisition Team requires:

We are seeking a Principal Architect to lead the technical vision, architecture, and evolution of our Kubernetes-based infrastructure platforms supporting large-scale, GPU-enabled workloads. This role is responsible for defining end-to-end platform strategy across on‑premises and cloud environments, driving architectural standards, and partnering closely with engineering, SRE, and operations teams to deliver highly reliable, scalable infrastructure as a service to internal customers.

As a Principal Architect, you will operate at both the systems and organizational level—setting long-term technical direction, influencing platform adoption, and ensuring architectural decisions align with business priorities, reliability goals, and future growth.

•Own the end-to-end architectural vision for Kubernetes platforms spanning on‑prem and cloud environments, with a strong emphasis on scalability, resiliency, and operational excellence.

•Define and evolve reference architectures for:

-Multi-cluster and multi-tenant Kubernetes environments

-GPU-enabled workloads and high-performance computing (HPC) use cases

Hybrid infrastructure (on‑prem + AWS)

•Establish architectural standards, design principles, and best practices for platform services, networking, security, and observability.

Technical Leadership & Influence

Act as a technical authority and advisor across SRE, platform engineering, Cloud Foundations Automation (CFA), and service teams.

Lead architecture reviews and guide teams on complex design decisions involving distributed systems, networking, and workload orchestration.

Mentor senior engineers and architects, raising the overall technical bar of the organization.

Drive alignment across teams by translating business needs into scalable technical solutions.

Kubernetes & Infrastructure Design

Provide architectural oversight for:

Kubernetes cluster lifecycle automation (provisioning, upgrades, scaling)

CI/CD-driven application and platform deployments Helm, Kustomize, and platform-as-code approaches.

Guide design choices for workload managers and schedulers (e.g., Slurm, Run:AI) in GPU-heavy environments.

Influence strategies for infrastructure automation using Terraform and custom tooling.

Reliability, Observability & Operations

Partner with SRE and SRO teams to ensure platforms meet availability, performance, and supportability targets.

Define observability architecture across metrics, logging, and tracing for complex distributed systems.

Ensure operational readiness through architecture that supports effective troubleshooting, runbooks, and handoffs to operations teams.

Business Impact & Enablement

Enable internal customers by delivering infrastructure as a service that is reliable, well-documented, and easy to consume.

Support new strategic initiatives by architecting platforms and workflows for incoming projects.

Balance innovation with operational stability, ensuring architectural decisions scale with organizational growth.

Required Qualifications:

10+ years of experience in platform engineering, infrastructure architecture, SRE, or distributed systems roles.

Deep expertise in Kubernetes architecture, including on‑prem deployments and production-scale environments.

Strong understanding of Linux systems internals, performance tuning, and troubleshooting.

Advanced knowledge of networking fundamentals (L3/L4, DNS, load balancing, VPC networking).

Proficiency in one or more programming languages such as Go, Python, or Bash, with experience designing automation frameworks or platform services.

Proven experience architecting large-scale distributed systems with high reliability requirements.

Demonstrated ability to influence technical direction across multiple teams without direct authority.

Bachelor’s degree in Computer Science or a related technical field (or equivalent experience).

Preferred / Nice-to-Have Qualifications:

Experience with GPU workload management and scheduling (Slurm, Run:AI).

Architecture experience supporting multi-cluster, multi-tenant Kubernetes at scale.

Familiarity with distributed storage systems such as Lustre or VAST Data, particularly in HPC or ML environments.

Experience designing platforms for observability at scale.

Contributions to open-source projects related to Kubernetes, cloud-native ecosystems, or HPC tooling.

Prior experience in environments where the majority of infrastructure is on‑prem, with hybrid cloud integration.

Business Drivers & Customer Impact:

Architect and evolve a centralized Kubernetes platform used by internal engineering teams.

Ensure successful onboarding and long-term reliability of services developed by the CFA team.

Enable rapid delivery of new projects by providing well-architected, production-ready infrastructure.

Serve as a strategic partner to SRE and SRO teams, ensuring platforms are operationally sound and supportable at scale.

Additional Skills & Qualifications

Candidate Ideally have most of the following:

• AWS / AZURE / GCP

• Python

• Linux

• Puppet / Chef / Ansible

• Terraform

• Docker/ Kubernetes

• CI /CD (Automation, Metrics)

• Observability (Datadog / Dynatrace / Sysdig / Aqua)

• SEIM

Experience Level

Expert Level

Job Type & Location

This is a Permanent position based out of Hanover, MD.

Pay and Benefits

The pay range for this position is $148200.00 - $222400.00/yr.

We reserve the right to pay above or below the posted wage based on factors unrelated to sex, race, or any other protectedclassification.Additional earnings may be available through incentive programs like annual bonuses, profit sharing, etc.Our full-time, internal employment benefits include the following:
Medical, Dental, and Vision
Critical Illness, Accident, and Hospital
401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
Life Insurance (Voluntary Life and AD&D for employee and dependents)
Short and Long-Term Disability
Health Spending Account (HSA)
Transportation Benefits
Employee Assistance Program
Time Off/Leave (PTO, Vacation or Sick Leave)

Workplace Type

This is a fully remote position.

Application Deadline

This position is anticipated to close on Apr 17, 2026.

h4>About TEKsystems:

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

About TEKsystems and TEKsystems Global Services

We’re a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We’re a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We’re strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We’re building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.

The company is an equal opportunity employer and will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

// // //