AI Platform Engineer

Apollo Solutions

Boston, MA

Senior AI Platform Engineer

Job Description

We are building an AI Engineering function to enable productivity and agentic capabilities across the firm, for end users, developers, and business teams.

As a Senior AI Platform Engineer, you will design and own the shared platform that powers AI systems firm-wide: inference services, agentic platforms, developer tooling, and observability.

This is a financial services environment where data protection, auditability, and regulatory compliance are foundational requirements. You will ensure that AI capabilities are secure by default, auditable end-to-end, and easy for engineering teams to adopt.

You will report to the Head of AI Engineering and partner closely with Security Engineering, AI Integration/Application teams, and core infrastructure groups.

Responsibilities

Platform Infrastructure

Design, build, and operate the core AI platform, including managed LLM inference services (Amazon Bedrock and related), model access management, versioning, and routing across foundation models
Design and operate shared integration layers, including MCP servers, an MCP registry/gateway, and authorization services that connect AI platforms with core firm systems
Design and operate AI productivity data pipelines and dashboards for usage, cost, and adoption metrics
Design the infrastructure that supports AI-assisted developer tooling (Linux VDI environments), office productivity integrations (M365/Excel), and autonomous agent frameworks
Develop standardized inference and agentic AI platforms that teams can adopt across use cases, including reusable components for RAG, vector databases, and model integration patterns

Security & Guardrails

Partner with Security Engineering to embed security controls across the full AI lifecycle
Design, with the AI Security Engineer and infrastructure/platform teams, the controls that prevent destructive agent actions: filesystem permissions, IAM policies, network allowlists, sandbox configurations, and execution-time policy enforcement
Architect a default-deny posture: agents and tools access only explicitly permitted resources, with no ability to modify or delete production data unless specifically authorized through a human-approval workflow
Implement pre-execution guardrails (hooks, policy engines) that intercept and validate agent actions before they run
Ensure AI workloads operate within the corporate network boundary: VPC endpoints, PrivateLink, no public internet egress for inference traffic

Enablement & Scale

Build self-service onboarding so teams can consume AI platform services with appropriate access controls
Design systems that enable cost-effective operation of AI workloads, including quota management and chargeback visibility
Operate firm-wide AI applications and centrally managed AI services
Define reference architectures and patterns that other engineering teams use to build on the platform

Qualifications

10+ years as an infrastructure, platform, or systems engineer, with demonstrated experience building and operating shared services consumed by multiple teams, on-premises and on AWS
Strong expertise in AWS Bedrock (inference / agent core) and Azure OpenAI
Strong expertise in designing and implementing MCP registries, gateways, servers and Authorization flows
Hands-on experience supporting LLM-based workloads in production environments
Experience designing and enforcing AI security controls at the platform layer in a regulated or security-sensitive environment
Track record of building production-quality agentic AI patterns: tool use, function calling, MCP gateway/servers, retrieval-augmented generation, human-in-the-loop workflows
Track record of building production-quality platforms and developer-facing services, with emphasis on usability, standardization, and reliability
Strong written and verbal communication skills, with the ability to work effectively across security, application, and infrastructure teams