Principal AI Engineer

Flashii

San Francisco County, CA

Location: Onsite – San Francisco, CA

Employment Type: Full-Time

About the Role:

We are seeking an experienced Principal Agentic AI Engineer to lead the design, architecture, and delivery of an enterprise-scale Agentic AI platform. This individual will drive the technical vision for multi-agent AI systems, Retrieval-Augmented Generation (RAG), MCP-based tool integrations, and scalable microservices that let enterprises compose, govern, and operate domain-specific agents at scale — and drive reusability by codifying patterns into shared skills and sub-agents across the Agentic Development Lifecycle (ADLC).

Responsibilities:

Agentic AI Architecture: Own end-to-end design of multi-agent systems using LangChain, LangGraph, and Model Context Protocol (MCP) — including planner-executor patterns, sub-agent hierarchies, tool routing, retries, and cost-aware token budgeting.
RAG & Knowledge Systems: Architect production-grade RAG pipelines with vector databases (pgvector, Qdrant), hybrid retrieval, re-ranking, and document-aware chunking to ground agents in enterprise knowledge.
Solution Architecture: Design reference architectures and solution blueprints for enterprise clients across regulated and consumer-facing industries — translating business outcomes into agentic AI roadmaps and reusable accelerators.
Scalable Microservices: Build event-driven microservices on Kafka, polyglot data layers with PostgreSQL and vector DBs, and Kubernetes-based deployment topologies for high-throughput inference workloads.
MLOps & Model Lifecycle: Establish practices spanning training, fine-tuning, prompt and config versioning, structured evaluations against golden datasets, drift detection, and automated rollback when output quality degrades.
Traceability & Observability: Instrument agent reasoning traces, tool-call audit trails, token spend, and quality signals with Prometheus, Grafana, and OpenTelemetry — enabling policy enforcement and human- in-the-loop oversight.
Reusable Engineering Standards: Codify AI engineering patterns (RAG retrievers, agent loops, eval harnesses, traceability spans) into reusable skills, sub-agents, and platform components consumed across multiple product lines.
Rapid Engineering in Agentic Development Lifecycle: Roll out AI-led developer tools and sub-agents (Claude Code, Playwright MCP) across planning, code generation, code review, test authoring, and release validation — accelerating delivery while standardizing quality.
Presales & Client Engagement: Partner with sales, presales, and customer success on enterprise pursuits — authoring solution designs, leading technical workshops, and shaping agentic AI roadmaps for prospects and existing clients.

Qualifications:

8+ years of software engineering and solution architecture experience
3+ years of hands-on experience designing and deploying LLM-based or Agentic AI systems in production environments
Deep expertise with, LangChain, LangGraph, Retrieval-Augmented Generation (RAG), MCP / AI tool orchestration, Prompt engineering, Context engineering, Token optimization
Strong programming experience in Python and TypeScript (Java preferred)
Experience building scalable microservices using FastAPI, Spring Boot, Node.js, or related frameworks
Hands-on experience with, AWS, Azure, or GCP, Kubernetes, Docker, Terraform, CI/CD pipelines
Strong understanding of, MLOps, AI model lifecycle management, Evaluation framework, Drift detection, AI observability
Proven enterprise solution architecture experience translating business requirements into scalable AI solutions