Location: Onsite – San Francisco, CA
Employment Type: Full-Time
About the Role:
We are seeking an experienced Principal Agentic AI Engineer to lead the design, architecture, and delivery of an enterprise-scale Agentic AI platform. This individual will drive the technical vision for multi-agent AI systems, Retrieval-Augmented Generation (RAG), MCP-based tool integrations, and scalable microservices that let enterprises compose, govern, and operate domain-specific agents at scale — and drive reusability by codifying patterns into shared skills and sub-agents across the Agentic Development Lifecycle (ADLC).
Responsibilities:
- Agentic AI Architecture: Own end-to-end design of multi-agent systems using LangChain, LangGraph, and Model Context Protocol (MCP) — including planner-executor patterns, sub-agent hierarchies, tool routing, retries, and cost-aware token budgeting.
- RAG & Knowledge Systems: Architect production-grade RAG pipelines with vector databases (pgvector, Qdrant), hybrid retrieval, re-ranking, and document-aware chunking to ground agents in enterprise knowledge.
- Solution Architecture: Design reference architectures and solution blueprints for enterprise clients across regulated and consumer-facing industries — translating business outcomes into agentic AI roadmaps and reusable accelerators.
- Scalable Microservices: Build event-driven microservices on Kafka, polyglot data layers with PostgreSQL and vector DBs, and Kubernetes-based deployment topologies for high-throughput inference workloads.
- MLOps & Model Lifecycle: Establish practices spanning training, fine-tuning, prompt and config versioning, structured evaluations against golden datasets, drift detection, and automated rollback when output quality degrades.
- Traceability & Observability: Instrument agent reasoning traces, tool-call audit trails, token spend, and quality signals with Prometheus, Grafana, and OpenTelemetry — enabling policy enforcement and human- in-the-loop oversight.
- Reusable Engineering Standards: Codify AI engineering patterns (RAG retrievers, agent loops, eval harnesses, traceability spans) into reusable skills, sub-agents, and platform components consumed across multiple product lines.
- Rapid Engineering in Agentic Development Lifecycle: Roll out AI-led developer tools and sub-agents (Claude Code, Playwright MCP) across planning, code generation, code review, test authoring, and release validation — accelerating delivery while standardizing quality.
- Presales & Client Engagement: Partner with sales, presales, and customer success on enterprise pursuits — authoring solution designs, leading technical workshops, and shaping agentic AI roadmaps for prospects and existing clients.
Qualifications:
- 8+ years of software engineering and solution architecture experience
- 3+ years of hands-on experience designing and deploying LLM-based or Agentic AI systems in production environments
- Deep expertise with, LangChain, LangGraph, Retrieval-Augmented Generation (RAG), MCP / AI tool orchestration, Prompt engineering, Context engineering, Token optimization
- Strong programming experience in Python and TypeScript (Java preferred)
- Experience building scalable microservices using FastAPI, Spring Boot, Node.js, or related frameworks
- Hands-on experience with, AWS, Azure, or GCP, Kubernetes, Docker, Terraform, CI/CD pipelines
- Strong understanding of, MLOps, AI model lifecycle management, Evaluation framework, Drift detection, AI observability
- Proven enterprise solution architecture experience translating business requirements into scalable AI solutions