[Remote] Student Researcher [Seed Vision – Multimodal Joint Modeling] – 2026 Start (PhD)
Note: The job is a remote job and is open to candidates in USA. ByteDance is a leading company in AI foundation models, focusing on advanced research and technological advancements. The role of Student Researcher involves conducting research on multimodal generative models and contributing to foundational models for visual generation.
Responsibilities
- Conduct research on joint training of vision, language, and video models under a unified architecture
- Develop scalable and efficient methods for autoregressive-style multimodal pretraining, supporting both understanding and generation
- Explore cross-modal tokenization, alignment, and shared representation strategies
- Investigate instruction tuning, captioning, and open-ended generation capabilities across modalities
- Contribute to system-level improvements in data curation, model optimization, and evaluation pipelines
Skills
- Currently pursuing a PhD in Computer Vision, Machine Learning, NLP, or a related field
- Research experience in multimodal learning, large-scale pretraining, or vision-language modeling
- Proficiency in deep learning frameworks such as PyTorch or JAX
- Demonstrated ability to conduct independent research, with publications in top-tier conferences such as CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR
- Experience with autoregressive LLM training, especially in multimodal or unified modeling settings
- Familiarity with instruction tuning, vision-language generation, or unified token space design
- Background in model scaling, efficient training, or data mixture strategies
- Ability to work closely with infrastructure teams to deploy large-scale training workflows
Benefits
- Day one access to health insurance
- Life insurance
- Wellbeing benefits
- 10 paid holidays per year
- Paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year)
- Housing allowance
Company Overview
Company H1B Sponsorship
Apply To This Job