SRE(Site Reliability) Engineer- 100% Remote (Fulltime)
• 10-12 years of experience
• 8+ years in leadership roles managing large-scale SRE Programs
• Deep understanding of cloud-native architectures (AWS, Azure, GCP), microservices, and distributed systems.
• Proficiency in using Application Performance Monitoring (APM) tool New Relic/Dynatrace for monitoring, logging, tracing and Splunk for Log monitoring.
• Expertise in observability tools (e.g., Prometheus, Grafana, Datadog), CI/CD pipelines, and infrastructure as code (Terraform, Ansible).
• Strong experience with incident response, chaos engineering, and reliability testing.
• Proven ability to influence cross-functional teams and drive organizational change.
Apply tot his job
Apply To this Job