Staff Software Engineer – High Performance Computing & Machine Learning Infrastructure Development (Cloud Platform Engineering)
Join arenaflex's Cloud Infrastructure Team
Are you ready to shape the future of cloud computing? At arenaflex, we're building the infrastructure that powers the next generation of computing—from artificial intelligence to high-performance scientific simulations. We're looking for a talented Staff Software Engineer to join our Cloud Platform team and help us push the boundaries of what's possible in distributed computing.
As a Staff Software Engineer specializing in High Performance Computing (HPC) and Machine Learning (ML) infrastructure, you'll be at the forefront of developing cutting-edge solutions that enable researchers, enterprises, and developers to solve the world's most complex problems. Your work will directly impact how billions of users interact with cloud-based computational resources, from training large language models to running complex scientific simulations.
What You'll Do
In this role, you'll be responsible for designing, developing, and maintaining the critical infrastructure that powers arenaflex Cloud Platform's (ACP) high-performance computing capabilities. You'll work across the full stack, from kernel-level optimizations to user-facing HPC and ML applications, ensuring our platform delivers unparalleled performance, reliability, and scalability.
Key Responsibilities
- Full-Stack HPC & ML Development: Design and implement HPC and ML execution infrastructure on arenaflex Cloud Platform, including kernel optimization, userspace communication libraries (such as MPI libraries, libfabric, and NCCL), and client-facing HPC and ML applications that leverage our platform's capabilities.
- System Architecture: Architect and build large-scale distributed systems and networks that can handle massive computational workloads, ensuring optimal performance across geographically distributed data centers.
- Performance Optimization: Develop and optimize Linux kernel components, device drivers, and operating system subsystems to maximize hardware utilization and minimize latency for compute-intensive workloads.
- Technical Leadership: Set technical direction and standards for a team of engineers, providing mentorship and guidance while driving architectural decisions that impact platform-wide capabilities.
- Cross-Functional Collaboration: Work closely with product managers, other engineering teams, and customers to understand requirements and deliver solutions that exceed expectations.
- Code Quality & Testing: Design and implement comprehensive testing strategies, including unit tests, integration tests, and performance benchmarks, to ensure reliability of software releases.
- Infrastructure Development: Build and maintain wide-reaching infrastructure components that support our cloud platform's core functionality.
- Continuous Improvement: Identify opportunities for improvement in our technology stack, development processes, and system performance, taking ownership of implementing positive changes.
What We're Looking For
Essential Qualifications
To succeed in this role, you'll need a strong technical background and proven experience in software development. We're seeking candidates who can demonstrate the following:
- Educational Background: Bachelor's degree in Computer Science, Engineering, or a related technical field. We'll also consider equivalent practical experience in lieu of formal education.
- Programming Experience: Minimum of 3 years of hands-on software development experience, with deep expertise in data structures and algorithms.
- Software Engineering Lifecycle: At least 3 years of experience testing and shipping software products to production environments, with a minimum of 1 year in software design and architecture.
- Systems Development: 3+ years of experience building and developing large-scale infrastructure, distributed systems, or networks that serve high-volume traffic.
- Full-Stack Capabilities: Experience working across the entire technology stack, from low-level system programming to application-level development.
- Leadership Potential: Demonstrated ability to take ownership of projects and guide technical decisions for small to medium-sized engineering teams.
Preferred Qualifications
While not required, the following qualifications will help you stand out and make an immediate impact:
- Advanced Degree: Master's degree or PhD in Engineering, Computer Science, or a related technical field.
- Low-Level Systems Expertise: Direct experience with C++ programming, Linux kernel development, device drivers, and Remote Direct Memory Access (RDMA) technologies.
- Operating Systems Proficiency: Hands-on experience with Linux device drivers, networking stacks, operating system tuning, and software packaging.
- HPC & ML Communications: In-depth knowledge of HPC and ML communication frameworks, including MPI (Message Passing Interface), collective communication libraries, libfabric, and socket programming.
- Cloud Platform Experience: Prior experience developing HPC or ML solutions on major cloud platforms.
- Performance Optimization: Track record of optimizing system performance for compute-intensive workloads.
Skills & Competencies
Beyond technical qualifications, we value engineers who bring creativity, passion, and collaborative spirit to our team:
- Problem-Solving: Analytical mindset with the ability to tackle complex technical challenges and find elegant solutions.
- Communication: Strong written and verbal communication skills, with the ability to explain technical concepts to diverse audiences.
- Adaptability: Willingness to learn new technologies and take on challenges across the full stack as our business evolves.
- Initiative: Self-motivated approach with the ability to identify and address issues proactively.
- Collaboration: Team player who thrives in cross-functional environments and enjoys mentoring others.
- Leadership: Demonstrated leadership qualities and the passion to take on new challenges as we continue to push technology forward.
Career Growth & Learning Opportunities
At arenaflex, we believe in investing in our people. As a Staff Software Engineer, you'll have access to:
- Professional Development: Comprehensive training programs, conferences, and certification opportunities to enhance your technical skills.
- Career Advancement: Clear career paths with opportunities to grow into principal engineer, technical lead, or management roles.
- Technical Mobility: The flexibility to switch teams and projects as you and our business evolve, allowing you to explore different areas of our technology stack.
- Cutting-Edge Work: Exposure to the latest technologies and methodologies in cloud computing, AI, and distributed systems.
- Mentorship: Both receiving mentorship from senior leaders and mentoring junior team members.
Work Environment & Culture
arenaflex is more than a technology company—it's a community of innovators, problem-solvers, and dreamers. Here's what you can expect:
- Innovation-First Mindset: We encourage creative thinking and bold ideas. Your contributions will directly influence how billions of users connect with technology.
- Inclusive Culture: We value diverse perspectives and create an environment where everyone feels welcome and empowered to do their best work.
- Work-Life Balance: Flexible work arrangements, including remote work options, to help you maintain balance in your professional and personal life.
- Collaborative Spaces: Modern office spaces designed for collaboration, creativity, and productivity.
- Team Building: Regular team events, hackathons, and social activities that foster strong relationships among colleagues.
Compensation & Benefits
We recognize that exceptional talent deserves exceptional rewards. arenaflex offers a comprehensive compensation package that includes:
- Competitive Salary: Industry-leading salary commensurate with experience and qualifications.
- Equity: Stock options or equity participation program to share in our success.
- Health & Wellness: Comprehensive health, dental, and vision insurance plans.
- Retirement Benefits: 401(k) matching and retirement savings programs.
- Paid Time Off: Generous vacation policy, sick leave, and holidays.
- Parental Leave: Extensive parental leave programs for new parents.
- Learning Budget: Annual budget for professional development, courses, and certifications.
- Perks & Discounts: Access to exclusive perks, discounts, and employee wellness programs.
Location
This position is based in Sunnyvale, USA, with flexibility for hybrid or remote work arrangements. arenaflex supports flexible work models that help you do your best work while maintaining work-life balance.
Join Us
At arenaflex, our mission is to organize the world's information and make it universally accessible and useful. We're looking for engineers who share this vision and are passionate about building the infrastructure that powers the future of computing.
If you're ready to take on exciting challenges, work with cutting-edge technology, and make a real impact, we encourage you to apply. Bring your curiosity, creativity, and technical expertise—and help us shape the future of cloud computing.
arenaflex is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We look forward to receiving your application!
Apply for this job