Site Reliability Engineer (Job Code : J45032)  

 Job Summary
Share this job on Facebook  Share this job on Twitter  Share this job on Linked In
8.00 - 11.00  Years 
Site Reliability Engineer
BE-Comp/IT, BE-Other
Educational Level:
Stream of Study:
Computer Science/IT
Industrial Type:
IT-Software/Software Services
Functional Area:
IT Software - Application Programming / Maintenance
Key Skills:
sre, kubernetes
Job Post Date:
2022-04-29 12:06:50  

 Company Description
Our client is an information technology services firm that has a rich history of providing comprehensive technology services and solutions for five decades.

As a pioneer in IT services, we’ve partnered with some of the biggest global corporations across many industries. Our history was built on a foundation of partnerships with global brands like McDonald’s, Microsoft, CIT Group, Johnson & Johnson, Herbalife, Sony Pictures Entertainment, and many others. Whether it’s providing dedicated support centers, staffing quality teams, or delivering business service solutions, clients can always count on us.

 Job Description
• 8+ years overall experience with 2+ years in SRE role handling Kubernetes
• SRE Engineer with strong experience in monitoring, troubleshooting and support of Kubernetes container platforms
• Support rapid development and engineering productivity via release engineering, CI/CD automation, build tools.
• Perform health checks Apps/Infra to identify and pro-actively pre-empt issues from occurring (verification, alerts, etc).
• Work closely with engineering or DevOps teams to debug and fix issues as they arise.
• Work on development tasks and tools for infrastructure, deployment, monitoring, etc.
• Participate in on-call rotations and be responsible for infrastructure and platform level escalations.
• Work with DevOps team on planning and implementation of infrastructure capacity planning, upgrades, and monitoring.
• Participate in Daily (Standup) Production Reviews
• Contribute in design and improvement of deployment architecture of new and existing applications based on the principles of reliability, high availability, efficiency, and observability.
• Research, learn, adopt, customize, and create tools to improve the observability, resilience, and usability of applications in scope
• Create and maintain SRE related documentation (solution repository, Root Cause Analysis Reports etc)