Machine Learning Engineer (Inference)

Acceler8 Talent

San Francisco, CA

Machine Learning Engineer (Inference)

San Francisco, On-Site

$200,000-$300,000 + equity

Why this role

Early-stage infra company building a next-gen AI cloud (neocloud) — rethinking how models run across heterogeneous hardware.

You’ll own the layer that actually executes models in production.

🧠 What you’ll do

Build end-to-end inference systems (request → runtime → response)
Optimise for latency, throughput, and concurrency under real load
Design batching, scheduling, and queuing systems
Manage KV cache + memory at scale
Debug performance across model → runtime → hardware

� The fun technical bits

Deep dives into LLM inference (prefill, decode, attention)
Solving tail latency + throughput trade-offs
Working across systems, ML, and hardware layers
Optimising across GPUs + next-gen accelerators
Hands-on with vLLM, TensorRT-LLM, or custom runtimes

🎯 What they want

Experience with ML inference / model serving systems
Strong systems or backend engineering fundamentals
Comfortable with performance, memory, and scaling challenges
Python + C++

Related jobs

Trending Job Titles

Trending Locations

Trending Companies

Trending Categories

QA
Legal

// // //