Machine Learning Research Engineer, Enterprise ML Systems
Scale AI is a leading AI data foundry focused on accelerating the development of AI applications. The Machine Learning Research Engineer will build algorithms for a next-gen Agent RL training platform, support large scale training, and integrate state-of-the-art technologies to optimize ML systems for enterprise clients.
Responsibilities
- Build, profile and optimize our training and inference framework
- Post-train state of the art models, developed both internally and from the community, to define stable post-training recipes for our enterprise engagements
- Collaborate with ML teams to accelerate their research and development, and enable them to develop the next generation of models and data curation
- Create a next-gen agent training algorithm for multi-agent/multi-tool rollouts
Skills
- At least 1-3 years of LLM training in a production environment
- Passionate about system optimization
- Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc
- Ability to demonstrate know-how on how to operate the architecture of the modern GPU cluster
- Experience with multi-node LLM training and inference
- Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc
- Strong written and verbal communication skills to operate in a cross functional team environment
- PhD or Masters in Computer Science or a related field
Benefits
- Comprehensive health, dental and vision coverage
- Retirement benefits
- A learning and development stipend
- Generous PTO
- A commuter stipend
Company Overview
Company H1B Sponsorship
Apply To This Job