Senior Staff Engineer

eTeam
Redwood City, CA

Job Title: Senior Staff Engineer

Duration: 06 Months

Location: Redwood City, CA

Pay Range: $(98.00 - 101.00)/hr on W2 all-inclusive without benefits


Job Description:

The Senior Staff Engineer for NPE Observability is the preeminent technical strategist for global telemetry fabric. In this senior contract role, you will bridge the gap between high-scale distributed software and global network hardware, driving the architectural standards for our most complex data-intensive initiatives. You will own the technical integrity of our streaming pipelines, ensuring telemetry from the global fleet is ingested, normalized, and processed with sub-second latency. As a master of our tech stack (Java, Kafka, Postgres, Grafana), you will define the "Gold Standard" for technical excellence within the Network Platform Engineering (NPE) group.

Responsibilities:

Architectural Strategy & Technical Vision

• Core Stack Evolution: Architect and optimize our primary ingestion and storage engines utilizing Java and PostgreSQL, ensuring high availability and performance at scale.

• Real-Time Data Orchestration: Lead the design of high-throughput messaging systems using Apache Kafka to handle trillions of telemetry points with sub-second latency.

• Unified Visibility: Define the global standard for observability visualization in Grafana, building complex, high-performance dashboards that aggregate data from diverse telemetry sources.

High-Scale Engineering & Innovation

• Stream Processing Mastery: Architect massively parallel processing pipelines and stateful stream processing frameworks (utilizing tools like Apache Flink) to enable real-time anomaly detection.

• Advanced R&D: Evaluate and prototype emerging technologies such as Model-Driven Telemetry (MDT) and ClickHouse/Thanos for long-term metric storage and high-cardinality data analysis.

• Technical Roadmap Ownership: Drive the engineering team toward key milestones, ensuring the code we ship aligns with the 3–5 year long-term NPE vision.

Reliability & Systemic Leadership:

• Service Standards: Define and monitor critical SLI/SLO metrics (e.g., P95 response times) to ensure the platform maintains world-class performance and global ITIL compliance.

• Incident Authority: Serve as the senior point of contact for complex root-cause analysis, identifying architectural weaknesses in the Java/Kafka/Postgres stack to prevent future outages.

• Stakeholder Synthesis: Translate complex product requirements into deep technical specifications, managing relationships with both internal software teams and external network vendors.

Required Qualifications & Experience

Tenure: 10+ years of professional experience in software engineering and distributed systems.

Domain Expertise: 5+ years of experience specifically in large-scale network engineering, telemetry, or observability platforms.

Java Expert: Mastery of Java for building high-performance, scalable backend services.

Data & Messaging: Deep expertise in PostgreSQL (schema design and tuning) and Apache Kafka (cluster architecture and stream management).

Visualization: Expert-level proficiency in Grafana for creating enterprise-level observability dashboards.

Large-Scale Systems: Proven experience with Prometheus, Thanos, or Click House and working within a structured Agile/Scrum environment.

Education: Bachelor’s or Master’s degree in Computer Science or a related technical field.

// // //