Skip navigation EPAM

Lead Site Reliability Engineer (SRE) Hyderabad, India

Lead Site Reliability Engineer (SRE) Description

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

We are seeking a talented and motivated Lead Site Reliability Engineer to join our team. As a key member of our multi-disciplined team, you will play a crucial role in ensuring the reliability, performance, and security of our complex distributed systems. If you are passionate about operational risk management, have a deep understanding of Kubernetes and Containers, and possess strong problem-solving skills, this role offers an exciting opportunity to contribute to the success of our operations.

#LI-DNI

Responsibilities

  • Ability to rapidly and effectively understand and translate requirements into technical solutions
  • Ability to reason about performance, security, and process interactions in complex distributed system. Passionate about managing operational risk
  • Ability to work effectively as part of a diverse multi-disciplined team
  • Motivated, self-organized and have good time & work management skills

Requirements

  • Should have 8 to 12 years of experience as Site Reliability Engineer
  • Must have expert/intermediate level knowledge of Azure (preferred) or AWS/ GCP Cloud Infrastructure, networking, security, Storage. (GCP will be decommissioned in upcoming days, just Azure is also fine)
  • Must have intermediate level Python core skills
  • Must have expert/intermediate level python/cloud/windows admin debugging skills
  • Must have intermediate level knowledge of Windows or Linux administration. (Only Linux is also okay, Windows administration training can be given for 2 weeks)
  • Good to have expert/intermediate level knowledge in infrastructure monitoring as well as application monitoring and related tools ELK/Opsbridge/DynaTrace
  • Good to have Observability & Centralized Logging experience
  • Good to have knowledge of incident management (PagerDuty/OpsGinie/VictorOps)
  • Good to have knowledge of change management
  • Good to have knowledge of SLO, SLI, SLA
  • Good to have knowledge of Kubernetes and Docker
  • Good to have knowledge of CI/CD (especially Azure DevOps)

We offer

  • Opportunity to work on technical challenges that may impact across geographies
  • Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
  • Opportunity to share your ideas on international platforms
  • Sponsored Tech Talks & Hackathons
  • Unlimited access to LinkedIn learning solutions
  • Possibility to relocate to any EPAM office for short and long-term projects
  • Focused individual development
  • Benefit package:
    • Health benefits
    • Retirement benefits
    • Paid time off
    • Flexible benefits
  • Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)

在亿磐成长

周剑
解决方案架构师
苏州

朱晓华
首席软件测试工程师
苏州

金秋
首席软件工程师
苏州

我们在世界其他地方。。。