Back to Jobs

[Remote] Student Researcher [Seed Vision – Multimodal Video Generation] – 2026 Start (PhD)

Remote, USA Full-time Posted 2025-11-24

Note: The job is a remote job and is open to candidates in USA. ByteDance is a pioneering company dedicated to advanced AI foundation models. The role involves conducting research on multimodal video generation and collaborating with researchers and engineers to enhance generative models for visual content.


Responsibilities

  • Conduct research on multimodal video generation, with a focus on improving semantic alignment between inputs and generated content
  • Integrate vision-language models (e.g., CLIP, pre/post-trained VLMs) into video generation architectures to enhance input understanding
  • Explore and implement joint training or fine-tuning approaches that couple VLMs with video generation backbones
  • Evaluate model performance on tasks requiring high-level reasoning or detailed semantic control over generation
  • Collaborate with researchers and engineers to iterate on prototypes within an existing infrastructure

Skills

  • Currently pursuing a PhD in Computer Vision, Machine Learning, or a related field
  • Research experience in one or more of the following areas: Vision-language models (VLMs); Multimodal or joint model training; Video generation
  • Solid coding ability and clean research implementation style, and expected to work with a production-grade codebase (e.g., PyTorch)
  • Demonstrated research ability, with first-author publications in top-tier ML/CV/AI conferences such as CVPR, ICCV, ECCV, and ICLR
  • Experience in training or fine-tuning autoregressive or diffusion-based video generation models
  • Background in multimodal instruction-following, alignment, or conditioning for generation tasks
  • Understanding of evaluation techniques for assessing semantic consistency in generated video

Benefits

  • Health insurance
  • Life insurance
  • Wellbeing benefits
  • 10 paid holidays per year
  • Paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year)
  • Housing allowance

Company Overview

  • ByteDance is a technology company that develops content creation platforms and services. It was founded in 2012, and is headquartered in Beijing, Beijing, CHN, with a workforce of 10001+ employees. Its website is http://bytedance.com.

  • Company H1B Sponsorship

  • ByteDance has a track record of offering H1B sponsorships, with 1350 in 2025, 1123 in 2024, 775 in 2023, 487 in 2022, 417 in 2021, 245 in 2020. Please note that this does not guarantee sponsorship for this specific role.

  •   Apply To This Job

    Similar Jobs