Back to Search
Overview
Mid-Level

Site Reliability Engineer 2/3 - Cloud (5 to 12 Years)

Confirmed live in the last 24 hours

PhonePe

PhonePe

Bangalore
Hybrid
Posted March 12, 2026

Job Description

About PhonePe Limited:

Headquartered in India, its flagship product, the PhonePe digital payments app, was launched in Aug 2016. As of April 2025, PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40+ million) merchants. PhonePe also processes over 33 Crore (330+ Million) transactions daily with an Annualized Total Payment Value (TPV) of over INR 150 lakh crore. 

 

PhonePe’s portfolio of businesses includes the distribution of financial products (Insurance, Lending, and Wealth) as well as new consumer tech businesses (Pincode - hyperlocal e-commerce and Indus AppStore Localized App Store for the Android ecosystem) in India, which are aligned with the company’s vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.

 

Culture:

At PhonePe, we go the extra mile to make sure you can bring your best self to work, Everyday!. And that starts with creating the right environment for you. We empower people and trust them to do the right  thing. Here, you own your work from start to finish, right from day one. PhonePe-rs solve complex problems and execute quickly; often building frameworks from scratch. If you’re excited by the idea of building platforms that touch millions, ideating with some of  the best minds in the country and executing on your dreams with purpose and speed, join us!

Job Summary

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) with 5 to 12 years of experience to manage, scale, and ensure the high availability of our core infrastructure. This role is open to experts specialized in either Microsoft Azure or AWS. You will be responsible for deep-level cloud architecture, automation, and complex networking to support a high-volume, mission-critical environment where downtime is not an option.

Key Responsibilities

Cloud & Infrastructure Management

  • Cloud Operations: Configure, maintain, and manage Ubuntu/Linux Virtual Machines in your primary cloud environment (Azure or AWS).
  • Managed Services: Design and manage cloud-native components for log storage, database management, and alerting (e.g., Azure Storage/ADX or AWS S3/CloudWatch).

Networking & Connectivity

  • Complex Networking: Configure and maintain critical network components, including Firewalls, Route Tables, and Virtual Gateways (VPC/VNet).
  • Hybrid Links: Establish and manage high-speed connectivity via Express Route (Azure) or Direct Connect (AWS) along with IPsec VPNs for external environments.
  • Troubleshooting: Resolve complex routing issues and manage network migrations with zero-to-minimal downtime.

Automation & Infrastructure as Code (IaC)

  • Everything as Code: Drive automation for all BAU (Business As Usual) tasks using Terraform, writing new code for all infrastructure components.
  • Config Management: Use Saltstack or Ansible for automated deployment and configuration of services on VMs.
  • Tooling: Develop custom scripts or services in Python, Go, or Java to eliminate manual toil.

Database & Data Management

  • High Availability: Set up and manage HA services like MySQL and Aerospike.
  • Global Replication: Implement database replication acros
pythonjavagorustawsazuredockeraiiosandroid