[Remote] AI Researcher — Inference Optimization
Note: The job is a remote job and is open to candidates in USA. FeatherlessAI is seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. The role involves improving latency, throughput, and cost efficiency across real-world production environments by developing techniques to optimize inference performance and collaborating with engineering teams to deploy optimized pipelines.
Responsibilities
• Research and develop techniques to optimize inference performance for large neural networks
• Improve latency, throughput, memory efficiency, and cost per inference
• Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications)
• Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization)
• Benchmark inference workloads across hardware accelerators
• Collaborate with engineering teams to deploy optimized inference pipelines
• Translate research insights into production-ready improvements
Skills
• Strong background in machine learning, deep learning, or AI systems
• Hands-on experience optimizing inference for large-scale models
• Proficiency in Python and modern ML frameworks (e.g., PyTorch)
• Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime)
• Ability to design experiments and communicate results clearly
• Experience deploying production inference systems at scale
• Familiarity with distributed and multi-GPU inference
• Experience contributing to open-source ML or inference frameworks
• Authorship or co-authorship of peer-reviewed research papers in machine learning, systems, or related fields
• Experience working close to hardware (CUDA, ROCm, profiling tools)
Company Overview
• We enable serverless inference via our GPU orchestration and model load-balancing system. It was founded in 2023, and is headquartered in San Francisco, California, USA, with a workforce of 2-10 employees. Its website is https://featherless.ai/.
Company H1B Sponsorship
• Featherless AI has a track record of offering H1B sponsorships, with 1 in 2025. Please note that this does not guarantee sponsorship for this specific role.
Apply tot his job
Apply To this Job