Senior AI Platform Engineer
Job Description
We are building an AI Engineering function to enable productivity and agentic capabilities across the firm, for end users, developers, and business teams.
As a Senior AI Platform Engineer, you will design and own the shared platform that powers AI systems firm-wide: inference services, agentic platforms, developer tooling, and observability.
This is a financial services environment where data protection, auditability, and regulatory compliance are foundational requirements. You will ensure that AI capabilities are secure by default, auditable end-to-end, and easy for engineering teams to adopt.
You will report to the Head of AI Engineering and partner closely with Security Engineering, AI Integration/Application teams, and core infrastructure groups.
Responsibilities
Platform Infrastructure
- Design, build, and operate the core AI platform, including managed LLM inference services (Amazon Bedrock and related), model access management, versioning, and routing across foundation models
- Design and operate shared integration layers, including MCP servers, an MCP registry/gateway, and authorization services that connect AI platforms with core firm systems
- Design and operate AI productivity data pipelines and dashboards for usage, cost, and adoption metrics
- Design the infrastructure that supports AI-assisted developer tooling (Linux VDI environments), office productivity integrations (M365/Excel), and autonomous agent frameworks
- Develop standardized inference and agentic AI platforms that teams can adopt across use cases, including reusable components for RAG, vector databases, and model integration patterns
Security & Guardrails
- Partner with Security Engineering to embed security controls across the full AI lifecycle
- Design, with the AI Security Engineer and infrastructure/platform teams, the controls that prevent destructive agent actions: filesystem permissions, IAM policies, network allowlists, sandbox configurations, and execution-time policy enforcement
- Architect a default-deny posture: agents and tools access only explicitly permitted resources, with no ability to modify or delete production data unless specifically authorized through a human-approval workflow
- Implement pre-execution guardrails (hooks, policy engines) that intercept and validate agent actions before they run
- Ensure AI workloads operate within the corporate network boundary: VPC endpoints, PrivateLink, no public internet egress for inference traffic
Enablement & Scale
- Build self-service onboarding so teams can consume AI platform services with appropriate access controls
- Design systems that enable cost-effective operation of AI workloads, including quota management and chargeback visibility
- Operate firm-wide AI applications and centrally managed AI services
- Define reference architectures and patterns that other engineering teams use to build on the platform
Qualifications
- 10+ years as an infrastructure, platform, or systems engineer, with demonstrated experience building and operating shared services consumed by multiple teams, on-premises and on AWS
- Strong expertise in AWS Bedrock (inference / agent core) and Azure OpenAI
- Strong expertise in designing and implementing MCP registries, gateways, servers and Authorization flows
- Hands-on experience supporting LLM-based workloads in production environments
- Experience designing and enforcing AI security controls at the platform layer in a regulated or security-sensitive environment
- Track record of building production-quality agentic AI patterns: tool use, function calling, MCP gateway/servers, retrieval-augmented generation, human-in-the-loop workflows
- Track record of building production-quality platforms and developer-facing services, with emphasis on usability, standardization, and reliability
- Strong written and verbal communication skills, with the ability to work effectively across security, application, and infrastructure teams