Director of AI (FDE)

Colossus Technologies Group

San Francisco, CA

Director of Agent Systems Engineering (Forward Deployed Engineering)

Location: Remote (U.S.) — Monthly team meetups

Compensation: Up to ~$300K base + equity

About the Role

We’re partnering with a fast-growing healthtech company building AI agents that operate real clinical workflows — including patient intake, administrative automation, and clinician support.

After deploying these systems in production, one thing became clear:

The hard problem isn’t the model — it’s making the system reliable enough to run real workflows.

These workflows are long-running, touch multiple systems (EHRs, APIs, internal tools), and require correctness, traceability, and resilience when things break.

To support this, the company is scaling its Forward Deployed Engineering (FDE) function from ~20 → 50 engineers this year.

We’re hiring a Director of Agent Systems Engineering to lead part of this organization.

What You’ll Do

This role sits at the intersection of AI engineering and real-world deployment.

You will lead teams responsible for turning complex workflows into production-grade agent systems — and ensuring those systems are reliable, observable, and repeatable.

Key responsibilities include:

Designing repeatable delivery systems for deploying agent workflows into production
Mentoring and scaling engineering pods, driving execution and technical excellence
Capacity planning and delivery predictability across multiple concurrent deployments
Setting the technical bar for reliability, observability, and system correctness
Driving architecture and system design, including debugging multi-step workflows and failure modes
Feeding learnings back into the core platform as reusable primitives and abstractions

This is a hands-on leadership role — you’ll be close to architecture, system behavior, and real production issues.

What We’re Looking For

We’re looking for engineers who think in systems, not just models.

Strong candidates will have:

Experience building and scaling distributed systems or platform infrastructure
Exposure to AI/LLM-based systems, ideally including agent workflows or orchestration
A deep understanding of reliability, observability, and failure handling in production systems
Experience working with complex, multi-step workflows across multiple services or APIs
A track record of turning repeated patterns into reusable platform capabilities
Leadership experience managing teams and driving delivery in ambiguous environments

Backgrounds may include: