Data Center Network Engineer

GTN Technical Staffing
Dallas, TX

Senior Network Engineer – Data Center / HPC Infrastructure

Location: Dallas, TX (Hybrid)

Type: Direct Hire

• Competitive base salary + performance bonus

• 100% company-paid benefits


**This position requires applicants to be currently authorized to work in the U.S. without employer sponsorship. We are unable to sponsor or take over sponsorship of employment visas at this time**


Overview

We are seeking a Senior Network Engineer to support the design, implementation, and operation of large-scale, high-performance data center networks powering HPC and AI workloads.

This role is focused on building and maintaining ultra-low-latency, high-throughput network infrastructure supporting GPU- and CPU-intensive compute environments. The position plays a critical role in ensuring network performance, scalability, and reliability across distributed data center environments.

The ideal candidate brings deep expertise in modern data center networking, hands-on experience with high-performance fabrics, and a strong understanding of networking requirements for HPC and AI workloads, including east-west traffic optimization, congestion management, and performance tuning.

Key Responsibilities

Data Center & HPC Network Engineering

• Design, implement, and operate high-performance data center networks supporting HPC and AI workloads

• Optimize network architectures for east-west traffic patterns, low latency, and high throughput across compute clusters

• Support large-scale GPU and CPU environments, ensuring consistent performance under heavy distributed workloads

• Contribute to network design supporting AI/ML training, simulation, and high-performance data processing

Data Center Fabric & Architecture

• Design and manage leaf-spine / Clos architectures leveraging EVPN-VXLAN overlays

• Support high-performance interconnects including DCI (Data Center Interconnect) and backbone connectivity

• Implement scalable multi-tenant network designs supporting workload isolation and segmentation

• Support WAN and interconnect strategies including cloud on-ramps and hybrid connectivity

Performance, Reliability & Optimization

• Monitor and tune network performance for latency, throughput, and congestion across HPC environments

• Perform deep packet analysis, traffic flow analysis, and root cause investigation for performance bottlenecks

• Support capacity planning and scaling strategies aligned with compute growth and workload demand

• Ensure high availability through redundancy design, failover testing, and operational rigor

Automation & Infrastructure Engineering

• Develop and maintain network automation frameworks using Python, Ansible, Git, and Jinja2

• Implement Infrastructure-as-Code (IaC) practices and CI/CD pipelines for network deployment and changes

• Drive standardization and repeatability across data center network builds and configurations

Observability & Telemetry

• Implement telemetry, monitoring, and observability solutions to provide real-time network visibility

• Analyze network metrics to proactively identify risks and optimize performance

• Integrate network telemetry into broader infrastructure monitoring and analytics platforms

Cross-Functional Collaboration

• Partner with HPC platform, compute, storage, and infrastructure teams to align network architecture with workload demands

• Collaborate with Solutions Architecture and Engineering teams on design and deployment of new environments

• Work closely with vendors and hardware partners to validate performance and interoperability

Leadership & Technical Ownership

• Serve as a senior technical resource during incidents and critical escalations

• Mentor junior engineers and contribute to documentation, standards, and best practices

• Drive continuous improvement across network architecture, operations, and tooling

Required Experience

• 5–8+ years of experience designing and supporting large-scale data center networks

• Strong experience with modern data center fabrics (leaf-spine / Clos architectures)

• Deep expertise with EVPN, VXLAN, BGP, and MPLS

• Hands-on experience with Cisco and/or Arista platforms (NX-OS, EOS, IOS-XR)

• Experience supporting high-performance environments (HPC, AI/ML, or hyperscale infrastructure)

• Strong understanding of network performance optimization for low-latency, high-bandwidth workloads

• Proven troubleshooting experience in complex, distributed network environments

Technical Skills

• Network automation: Python, Ansible, Jinja2, Git

• Infrastructure-as-Code (IaC) and CI/CD pipelines

• Network observability, telemetry, and performance monitoring tools

• Packet analysis and traffic flow diagnostics

Preferred Experience

• Experience with HPC networking concepts and architectures (GPU clusters, distributed training environments)

• Familiarity with InfiniBand or RDMA/RoCE networking

• Experience in hyperscale or AI-focused data center environments

• CCNP or equivalent certification preferred; CCIE or advanced certifications a plus

// // //