We are placing a Senior Azure Platform Engineer with a centralized Platform Engineering team at one of the world's largest financial institutions. The team owns the internal Azure platform that application teams across the bank depend on — and they're in a critical phase, scaling toward enterprise-wide General Availability while expanding into multi-cloud.
You'll build, harden, and operate real production infrastructure. You'll own incidents, run root cause analysis, and improve the platform so the same failure doesn't happen twice.
What you'll do
• Own Terraform-based infrastructure across multi-environment and multi-subscription Azure setups — state management, drift detection, remediation, and module design
• Operate AKS clusters in production — pod and node troubleshooting, scaling, ingress issues, cluster upgrades, and incident response
• Implement and enforce Azure Policy at management group and subscription scope, including deny and audit effects and active remediation
• Design and maintain platform security controls: Managed Identity, RBAC at control and data plane, Key Vault, Entra ID, and secure service-to-service communication
• Own production incidents end-to-end — triage, root cause analysis, resolution, and prevention
• Build observability into the platform: logging strategy, alerting, container monitoring, and AKS diagnostic tooling
• Partner with application teams on platform onboarding, automation patterns, and best practices
• Contribute to platform hardening and standardization as the platform scales to support more teams
What you bring
Terraform ★ - Remote state management, state locking, drift detection and remediation, multi-environment module design, recovery and import scenarios. This is the highest-priority bar.
AKS / Kubernetes - Production cluster operations — pod and node troubleshooting, resource exhaustion, ingress, rollbacks, cluster upgrades. You need operational stories, not theory.
Azure Policy & governance - Hands-on policy enforcement (deny, audit, modify) at management group and subscription scope. Remediation tasks and compliance reporting.
Security & identity - Managed Identity (system vs. user-assigned), RBAC at control and data plane levels, Key Vault, Entra ID, JWT/OAuth, secure inter-service communication.
Observability & RCA - Log Analytics, Azure Monitor, Prometheus, Grafana, Splunk, or ELK. Full incident triage from symptom to resolution to prevention.
Azure platform services - AKS, App Services, Azure Functions, Storage, VNets, Private Endpoints, NSGs, Azure Firewall — hands-on production experience.
API Management - Policy configuration: throttling, rate limiting, auth enforcement, request/response transformations, observability integration.