[Remote] Software Engineer Intern (Inference Infrastructure) - 2026 Summer (MS/BS)
Note: The job is a remote job and is open to candidates in USA. ByteDance is a rapidly growing tech company focused on inspiring creativity and enriching life through innovative products. They are seeking a Software Engineer Intern for the Inference Infrastructure team to design and build large-scale, container-based cluster management systems and collaborate on AI inference solutions. This internship offers students hands-on experience and exposure to real-world scenarios in a hyper-scale environment.
Responsibilities
- Design and build large-scale, container-based cluster management and orchestration systems with extreme performance, scalability, and resilience
- Architect next-generation cloud-native GPU and AI accelerator infrastructure to deliver cost-efficient and secure ML platforms
- Collaborate across teams to deliver world-class inference solutions using vLLM, SGLang, TensorRT-LLM, and other LLM engines
- Stay current with the latest advances in open source (Kubernetes, Ray, etc.), AI/ML and LLM infrastructure, and systems research; integrate best practices into production systems
- Write high-quality, production-ready code that is maintainable, testable, and scalable
Skills
- B.S./M.S. in Computer Science, Computer Engineering, or related fields with 2+ years of relevant experience
- Able to commit to working for 12 weeks during Summer 2026
- Strong understanding of large model inference, distributed and parallel systems, and/or high-performance networking systems
- Hands-on experience building cloud or ML infrastructure in areas such as resource management, scheduling, request routing, monitoring, or orchestration
- Solid knowledge of container and orchestration technologies (Docker, Kubernetes)
- Proficiency in at least one major programming language (Go, Rust, Python, or C++)
- Experience contributing to or operating large-scale cluster management systems (e.g., Kubernetes, Ray)
- Experience with workload scheduling, GPU orchestration, scaling, and isolation in production environments
- Hands-on experience with GPU programming (CUDA) or inference engines (vLLM, SGLang, TensorRT-LLM)
- Familiarity with public cloud providers (AWS, Azure, GCP) and their ML platforms (SageMaker, Azure ML, Vertex AI)
- Strong knowledge of ML systems (Ray, DeepSpeed, PyTorch) and distributed training/inference platforms
- Excellent communication skills and ability to collaborate across global, cross-functional teams
- Passion for system efficiency, performance optimization, and open-source innovation
Benefits
- Day one access to health insurance
- Life insurance
- Wellbeing benefits
- 10 paid holidays per year
- Paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year)
- Housing allowance
Company Overview
Company H1B Sponsorship
Apply To This Job