Back to Search
Overview
Mid-Level

Site Reliability Engineer - Azure (4-8yrs)

Confirmed live in the last 24 hours

PhonePe

PhonePe

Bangalore
On-site
Posted April 24, 2026

Job Description

About PhonePe Limited:

Headquartered in India, its flagship product, the PhonePe digital payments app, was launched in Aug 2016. As of April 2025, PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40+ million) merchants. PhonePe also processes over 33 Crore (330+ Million) transactions daily with an Annualized Total Payment Value (TPV) of over INR 150 lakh crore. 

 

PhonePe’s portfolio of businesses includes the distribution of financial products (Insurance, Lending, and Wealth) as well as new consumer tech businesses (Pincode - hyperlocal e-commerce and Indus AppStore Localized App Store for the Android ecosystem) in India, which are aligned with the company’s vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.

 

Culture:

At PhonePe, we go the extra mile to make sure you can bring your best self to work, Everyday!. And that starts with creating the right environment for you. We empower people and trust them to do the right  thing. Here, you own your work from start to finish, right from day one. PhonePe-rs solve complex problems and execute quickly; often building frameworks from scratch. If you’re excited by the idea of building platforms that touch millions, ideating with some of  the best minds in the country and executing on your dreams with purpose and speed, join us!

Summary

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to manage, scale, and ensure the high availability of our core infrastructure. This role involves deep expertise in cloud services, automation, monitoring, and complex networking to support a high-volume, mission-critical environment.

Key Responsibilities

  • Cloud & Infrastructure: Configure, maintain, and manage services and packages on Ubuntu Virtual Machines in Azure. Design and manage Azure components for log storage, management, alerting, and monitoring.
  • Networking & Connectivity: Configure and maintain complex network components including Azure Firewall, Route Tables, Virtual Network Gateways, and Express Route. Establish and manage IPsec and Express Route connectivity with external environments. Manage routing, troubleshooting connectivity issues, and support network component migrations with minimal downtime.
  • Automation & IaC: Drive automation for all BAU tasks using Terraform, Saltstack, Ansible, and scripting languages. Write new Terraform code for infrastructure components.
  • Database & Data Management: Set up and manage high-availability services like Mysql and Aerospike. Implement database replication across regions, manage migrations, and ensure data sync. Handle backups of databases, logs, and configurations.
  • Monitoring & Observability: Implement and manage monitoring (e.g., Prometheus, Victoria Metrics, Riemann) and centralized logging (Loki) solutions, with visualization on Grafana. Troubleshoot performance and system issues at the OS, platform, or application level.
  • Security & Compliance: Manage firewalls and integrate platform and VM-level services with the SOC. Collaborate with Infosec teams to evaluate and fix security vulnerabilities.
  • Capacity & Performance: Conduct proactive capacity planning. Manage critical infrastructure components like Nginx, HA Proxy, Docker, and RMQ.
  • Incident Management & DR: Participate in an on-call rotation. Structure and lead incident response, Root Cause Analysis (RCA), and post-mortem creation. Set up and support planning and
pythonjavagorustazuredockeraiiosandroiddata