⚡ Research Engineer - ML Sys, Infra Optimization and Scaling
📈 Foundation Models, AI Research Institute
🌎 San Francisco Bay Area, USA
💸 $250,000 - $400,000 salary + bonus
Come join a frontier AI lab in the Bay Area that's developing large-scale foundation models, and already with high-impact breakthroughs across LLMs, RL and Multimodal AI.
Opportunity to join as an early member in a rapidly growing team, led by A-listers in tech & academia. Seeking Engineers/Researchers skilled in distributed training, inference optimization and scaling laws for large-scale deep learning clusters and infrastructure.
Responsibilities:
- Research & implement SOTA methods in ML Systems, Inference Optimization and HPC
- Set up / improve distributed training infrastructure and optimizer frameworks
- Write production-quality code and ensure reliability at scale
- Work collaboratively in a tight-knit team to compete on the global AI leaderboards
Requirements:
- MS or PhD in Comp Sci, Comp Engineering, or related
- Expertise in distributed ML frameworks (e.g., DeepSpeed, FSDP, vLLM)
- Multi-node (Ray, Kubernetes) and distributed inference optimization experience
- Experience working with CUDA and deep learning optimization in HPC environment
- Proficiency in leveraging high compute GPU clusters
Why apply:
- Opportunity to build out a new division at the forefront of AI innovation
- FAANG competitive salary & package
- Work alongside superstars from FAANG labs & leading AI companies
- Medical, Dental and Vision Insurance
- Relocation package available
🌎 San Francisco Bay Area, USA
📧 Interested in applying? Please click on the ‘Easy Apply’ button or alternatively email me your resume at anir.gantugs@storm3.com