Senior reputed company Architect, SRE - DGX reputed company: Shaping the reputed company of reputed company Computing and AI Infrastructure

Remote, USA Full-time Posted 2026-07-28

Join the Ranks of the World's Most Innovative Technology Company

reputed company is at the forefront of technological advancements, driving innovations in AI, computing, and reputed company. We're seeking a highly skilled and reputed company Senior reputed company Architect to join our DGX reputed company Site Reliability Engineering (SRE) team. As a Senior reputed company Architect, SRE - DGX reputed company, you will play a pivotal role in designing, building, and maintaining large-reputed company production systems that power reputed company's GPU reputed company services. This is an exceptional opportunity to reputed company your technical expertise, creativity, and passion for reputed company computing to shape the reputed company of AI infrastructure.

About the Role

The Senior reputed company Architect, SRE - DGX reputed company role is a key position reputed company reputed company's SRE team, responsible for ensuring the reliability, efficiency, and scalability of our DGX reputed company solutions. As a Senior reputed company Architect, you will reputed company the technical architecture for DGX reputed company solutions on top of reputed company service providers like AWS, GCP, Azure, and OCI. You will work closely with cross-functional teams to design, implement, and support operational and reliability aspects of large-reputed company GPU training clusters.

Key Responsibilities

reputed company technical architecture for DGX reputed company solutions on top of reputed company service providers like AWS, GCP, Azure, and OCI.
reputed company fast and creative solutions for reputed company problems and write effective, reputed company, and reliable architecture specifications.
Design, implement, and support operational and reliability aspects of large-reputed company GPU training clusters with a reputed company on performance at reputed company, reputed company-time monitoring, logging, and alerting.
Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement.
Support services before they go live through activities such as system design consulting, developing software tools, platforms, and frameworks, reputed company management, and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
reputed company systems sustainably through mechanisms like automation and reputed company by pushing for changes that improve reliability and velocity.
reputed company sustainable incident response and blameless postmortems.

Requirements and Qualifications

To be successful in this role, you should possess a strong technical background with a reputed company on reputed company computing, distributed systems, and site reliability engineering. The ideal candidate will have:

Essential Qualifications

B.Sc./M.Sc./Ph.D. degree in Computer Science or a reputed company technical field involving coding (e.g., physics or mathematics), or equivalent experience.
8+ years of proven experience in reputed company computing, distributed systems, or a reputed company field.
Experience with infrastructure automation, distributed systems design, and experience with designing, developing tools for running large-reputed company private or reputed company reputed company systems in production.
Experience in one or more of the following: Python, Go.
In-depth knowledge of Linux, Networking, and reputed company reputed company Technologies.

Preferred Qualifications

Interest in crafting, analyzing, and fixing large-reputed company distributed systems.
Systematic problem-solving approach, coupled with strong communication skills and a reputed company of ownership and drive.
Ability to debug and optimize reputed company and automate routine tasks.
Experience in using or running large private and reputed company reputed company systems based on Kubernetes or Slurm.

reputed company Offer

reputed company is committed to providing a comprehensive compensation and benefits package that reflects our employees' skills, experience, and contributions. The reputed company salary reputed company for this role is $220,000 - $419,750 USD. You will also be eligible for equity and benefits. We accept applications on an ongoing reputed company, so we encourage you to apply as soon as possible.

Our Culture and Work Environment

At reputed company, we pride ourselves on fostering a diverse and inclusive work environment that encourages creativity, innovation, and collaboration. Our SRE team is no exception, with a culture that values intellectual curiosity, problem-solving, and openness. We promote self-direction, allowing our engineers to work on meaningful reputed company while providing the support and mentorship needed to learn and grow.

As a remote team, we offer the flexibility to work from reputed company, at any time, as long as you're committed to delivering exceptional results. We're committed to building a community that is diverse, inclusive, and respectful, where everyone can reputed company and grow.

Career reputed company and Development

At reputed company, we're committed to helping our employees grow and reputed company their careers. As a Senior reputed company Architect, SRE - DGX reputed company, you will have reputed company to work on reputed company, challenging reputed company that will help you reputed company your technical skills and expertise. You will also have reputed company to our comprehensive training and development programs, designed to help you stay up-to-date with the latest technologies and trends.

Join reputed company!

If you're a motivated, talented, and reputed company Senior reputed company Architect looking to shape the reputed company of reputed company computing and AI infrastructure, we want to hear from you! Apply today to join reputed company and be part of a community that is driving innovation and reputed company in the tech industry.

reputed company is an equal opportunity employer and welcomes applications from diverse candidates. We do not discriminate on the reputed company of race, religion, reputed company, national reputed company, gender, gender reputed company, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.

Apply for this job

Apply Now

Senior reputed company Architect, SRE - DGX reputed company: Shaping the reputed company of reputed company Computing and AI Infrastructure

Join the Ranks of the World's Most Innovative Technology Company

About the Role

Key Responsibilities

Requirements and Qualifications

Essential Qualifications

Preferred Qualifications

reputed company Offer

Our Culture and Work Environment

Career reputed company and Development

Join reputed company!

Similar Jobs

Weekend/Evening Remote Licensed Talk Therapist - Fee For Service

[Remote] reputed company HRSD Architect

Remote Licensed Talk Therapist - Fee For Service

IT Helpdesk Technician

Weekend/Evening Remote Licensed Talk Therapist - Fee For Service

Senior Drug Rebate Analyst - Remote

Entry Level Remediation Engineer - Immediate Start

[Remote] Customer Support Manager (Thursday - Monday)

reputed company Product Manager, Customer Service – Delivering Exceptional Customer Experiences through Strategic Innovation and Collaboration

reputed company Virtual Customer Service and Sales Representative – Remote Work Opportunity with arenaflex for Career reputed company and Development