Title: Principal Platform Engineer
Location : Denver, CO (2–3 weeks onsite in our Denver office and the remaining time working remotely within the U.S.)
Job Description – Contractor (Alerting & Monitoring Correlation Engines)
Location: Onsite, Denver, CO Engagement: Initial 6‑week Statement of Work (SOW), with potential transition to long‑term contract
Role Overview
We are seeking a highly skilled contractor to design, build, and deliver advanced alerting and monitoring correlation engines. This role requires hands‑on expertise in event‑driven architectures, AI‑compatible workflows, and integration across multiple operational tools. The contractor will work onsite with our engineering team in Denver to deliver a unified alerting service, SOP library, tenancy access model, SLA/SLO measurement framework, and escalation workflows.
Key Responsibilities
Alerting Service Development
Build a unified service ingesting Issues, Changes, and Threshold Warnings.
Implement severity assignment (P0–P3) per documented rules.
Configure routing to Freshservice, PagerDuty, Slack, webhook, and email.
Track response/update/resolution timers (§1.3.1).
Deduplicate repeat firings into single tickets/incidents.
Enforce P0 hard‑silence rejection and external‑customer notification gates (15‑min confirmation, ≥0.85 confidence).
SOP Library Creation
Develop SOPs for each distinct alert condition using §1.4.2 template.
Assign ownership (Ops/Eng/Site Lead).
Document end‑to‑end walkthrough records.
Per‑Customer Tenancy Access
Design authenticated, read‑only, data‑scoped views for ticketing, audit trail, and operational state.
Ensure extensibility for additional customers without re‑architecture.
Continuous SLA/SLO Measurement
Implement metrics, dashboards, and reports for continuous compliance monitoring (§1.3.1, §1.3.2).
Provide operational interpretation guides.
Escalation Configuration
Configure PagerDuty L1→L2→L3 contact‑tier escalation per severity timing tables.
Required Skills & Experience
Proven expertise in event‑driven architectures and monitoring/alerting systems.
Strong experience with AI‑driven coding assistants (Claude Code, Codex, or similar).
Hands‑on integration with Freshservice, PagerDuty, Slack, webhooks, and email systems.
Solid understanding of SLA/SLO frameworks and operational compliance measurement.
Experience building multi‑tenant, secure access models.
Familiarity with SOP design and operational documentation.
Strong programming background (Python, Go, or Node.js preferred).
Knowledge of agentic AI workflows and compatibility design.
Preferred Qualifications
Prior experience in telecom, networking, or large‑scale infrastructure monitoring.
Background in correlation engines, anomaly detection, or AI‑driven alerting.
Ability to work onsite in Denver and collaborate closely with engineering leadership.
Engagement Details
Duration: 6 weeks initial SOW.
Extension: Potential transition to long‑term contract based on performance and delivery.
Deliverables: Build & deliver the alerting service, SOP library, tenancy access model, SLA/SLO framework, and escalation configuration.