Production Support (Java or SRE)
Job Title: Tech Lead/Engineers Production Support & Platform (Java Developers and SRE Engineers)
Location: Richmond, McLean, or Remote
Key Responsibilities:
• Tech Lead - Lead and mentor a team of 15+ engineers for production support and platform stability.
• Engineer more than 50% Production Support and Deployment and follow to application development and migration based on Sprint scope.
• Manage pager duty rotations and ensure timely incident resolution.
• Provide Level 4 support (deep technical troubleshooting and fixes).
• Oversee rotational night shifts (approximately once every 2.5 months).
• Ensure compliance with SLAs and operational excellence for critical systems.
• Collaborate with stakeholders for platform strategy and migration planning.
• Drive Run-the-Engine development work and support enhancements.
• Prepare for and lead the platform migration phase in the third year.
• Monitor application performance, batch jobs, and system health across production and lower environments.
• Respond to incidents, alerts, Sev1/Sev2 outages, and provide real-time support following bank's Incident Management processes.
• Perform root cause analysis (RCA), create remediation plans, and ensure issues are permanently resolved.
• Support on-call rotations and pager duty responsibilities.
• Collaborate with development, SRE, and infrastructure teams to troubleshoot application, database, and integration issues.
• Execute deployments, configuration changes, and release support using CI/CD pipelines (OnePipeline preferred).
• Create/maintain operational dashboards, runbooks, SOPs, and automation scripts.
• Ensure compliance with bank technology and security standards.
Required Skills & Experience:
• Tech Lead - Ability to manage large teams (10 50 members) and complex platforms.
• Java Development and Site Reliability Engineering (SRE) expertise.
• Strong experience in production support and incident management .
• Hands-on experience with pager duty tools and support workflows .
• Excellent problem-solving and communication skills.
• Minimum 2 years of experience in similar roles.
• Strong experience in Unix/Linux, shell scripting, and troubleshooting distributed systems.
• Hands-on experience with AWS (CloudWatch, Lambda, EC2, S3, IAM, RDS, DynamoDB).
• Familiarity with Java-based applications, microservices, APIs, and log analysis (Splunk, CloudWatch Logs).
• Experience with CI/CD tools like Jenkins, OnePipeline, Git, and automated deployment strategies.
• Knowledge of incident management, problem management, and change management processes.
• Strong analytical skills and the ability to quickly diagnose complex issues.
Apply tot his job
Apply To this Job