Machine Learning Engineer (Inference)

Acceler8 Talent
San Francisco, CA

Machine Learning Engineer (Inference)

San Francisco, On-Site

$200,000-$300,000 + equity


Why this role

Early-stage infra company building a next-gen AI cloud (neocloud) — rethinking how models run across heterogeneous hardware.


You’ll own the layer that actually executes models in production.


🧠 What you’ll do

  • Build end-to-end inference systems (request → runtime → response)
  • Optimise for latency, throughput, and concurrency under real load
  • Design batching, scheduling, and queuing systems
  • Manage KV cache + memory at scale
  • Debug performance across model → runtime → hardware


� The fun technical bits

  • Deep dives into LLM inference (prefill, decode, attention)
  • Solving tail latency + throughput trade-offs
  • Working across systems, ML, and hardware layers
  • Optimising across GPUs + next-gen accelerators
  • Hands-on with vLLM, TensorRT-LLM, or custom runtimes


🎯 What they want

  • Experience with ML inference / model serving systems
  • Strong systems or backend engineering fundamentals
  • Comfortable with performance, memory, and scaling challenges
  • Python + C++
// // //