[Remote] Student Researcher [Seed LLM Post Training – Reward Modeling] - 2026 Start (PhD)
Note: The job is a remote job and is open to candidates in USA. ByteDance is dedicated to pioneering advanced AI foundation models and is seeking a Student Researcher for their Seed LLM Post Training team. The role involves researching and developing reward models, enhancing controllability and instruction-following performance, and contributing to data selection and synthesis pipelines.
Responsibilities
- Design and train reward models that reflect nuanced human preferences in LLM outputs
- Develop and evaluate components of a Reward Model System that integrates model predictions, verifier feedback, tool usage, and agent signals to produce reliable, generalizable reward estimates
- Develop reward models to enhance controllability and instruction-following performance, especially in scenarios involving complex, multi-part user requests
- Contribute to data selection and synthesis pipelines that improve post-training data quality, leveraging reward signals to expand the model's capabilities
- Research scalable methods for learning from pairwise comparisons, rankings, or human demonstrations across diverse tasks
Skills
- Currently pursuing a PhD in Computer Science, Machine Learning, or a related technical field
- First-author publications in top-tier venues (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP)
- Research experience in reward modeling, human preference learning, or LLM post-training
- Proficient in Python and deep learning frameworks such as PyTorch or JAX
- Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
- Experience with RLHF, DPO, rejection sampling, or ranking-based supervision methods
- Familiarity with model-based reward composition, verifier integration, or synthetic data pipelines
- Understanding of how reward models interact with large-scale RL and agent systems
Benefits
- Interns have day one access to health insurance
- Life insurance
- Wellbeing benefits
- 10 paid holidays per year
- Paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year)
- Housing allowance
Company Overview
Company H1B Sponsorship
Apply To This Job