Research Engineer - ML Sys & Infra

Storm3
San Jose, CA

⚡ Research Engineer - ML Sys, Infra Optimization and Scaling

📈 Foundation Models, AI Research Institute

🌎 San Francisco Bay Area, USA

💸 $250,000 - $400,000 salary + bonus


Come join a frontier AI lab in the Bay Area that's developing large-scale foundation models, and already with high-impact breakthroughs across LLMs, RL and Multimodal AI.


Opportunity to join as an early member in a rapidly growing team, led by A-listers in tech & academia. Seeking Engineers/Researchers skilled in distributed training, inference optimization and scaling laws for large-scale deep learning clusters and infrastructure.


Responsibilities:

  • Research & implement SOTA methods in ML Systems, Inference Optimization and HPC
  • Set up / improve distributed training infrastructure and optimizer frameworks
  • Write production-quality code and ensure reliability at scale
  • Work collaboratively in a tight-knit team to compete on the global AI leaderboards


Requirements:

  • MS or PhD in Comp Sci, Comp Engineering, or related
  • Expertise in distributed ML frameworks (e.g., DeepSpeed, FSDP, vLLM)
  • Multi-node (Ray, Kubernetes) and distributed inference optimization experience
  • Experience working with CUDA and deep learning optimization in HPC environment
  • Proficiency in leveraging high compute GPU clusters


Why apply:

  • Opportunity to build out a new division at the forefront of AI innovation
  • FAANG competitive salary & package
  • Work alongside superstars from FAANG labs & leading AI companies
  • Medical, Dental and Vision Insurance
  • Relocation package available


🌎 San Francisco Bay Area, USA

📧 Interested in applying? Please click on the ‘Easy Apply’ button or alternatively email me your resume at anir.gantugs@storm3.com

// // //