Skip navigation EPAM

Senior Site Reliability Engineer Hyderabad, India

  • hot

Senior Site Reliability Engineer Description

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

We are seeking a talented and motivated Senior Site Reliability Engineer (SRE) to join our organization.

The experienced SRE will play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies.


#LI-DNI

Responsibilities

  • Ensure system stability and high availability by proactively monitoring performance and troubleshooting issues
  • Design, build and maintain efficient, reliable, and scalable cloud-based infrastructure and services
  • Automate repetitive tasks and workflows to improve efficiency and reduce error using scripting and programming languages
  • Implement and manage observability tools for comprehensive monitoring, alerting, and logging
  • Develop and execute automation strategies using tools like Jenkins, GitLab, and Ansible/Chef
  • Define and oversee SLI, SLO, SLA, and Error Budget to maintain service quality
  • Provide on-call support for incident management and participate actively in response activities

Requirements

  • Should have 5 to 8 years of experience
  • Well-versed with scripting/programming languages (Python/Bash/PowerShell, etc.) to automate manual work, particularly within cloud environments
  • Well-versed with Observability tools (Grafana, Splunk, Dynatrace) for monitoring, alerting, and logging solutions to identify and address potential issues, especially in cloud infrastructure
  • Working experience with automation tools (Jenkins, GitLab, Ansible/Chef for configuration management) and processes to streamline deployment, monitoring, and management of systems and applications in the cloud
  • Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar, particularly in cloud-native environments
  • Well aware of SLI, SLO, SLA, and Error Budget concepts and their implementations; provide on-call support and participate in incident management & response activities as needed

We offer

  • Opportunity to work on technical challenges that may impact across geographies
  • Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
  • Opportunity to share your ideas on international platforms
  • Sponsored Tech Talks & Hackathons
  • Unlimited access to LinkedIn learning solutions
  • Possibility to relocate to any EPAM office for short and long-term projects
  • Focused individual development
  • Benefit package:
    • Health benefits
    • Retirement benefits
    • Paid time off
    • Flexible benefits
  • Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)

在亿磐成长

周剑
解决方案架构师
苏州

朱晓华
首席软件测试工程师
苏州

金秋
首席软件工程师
苏州

我们在世界其他地方。。。