Staff Escalation DevOps Engineer
Confirmed live in the last 24 hours
Zscaler
Job Description
About Zscaler
Zscaler is a pioneer and global leader in zero trust security. The world’s largest businesses, critical infrastructure organizations, and government agencies rely on Zscaler to secure users, branches, applications, data & devices, and to accelerate digital transformation initiatives. Distributed across more than 160 data centers globally, the Zscaler Zero Trust Exchange platform combined with advanced AI combats billions of cyber threats and policy violations every day and unlocks productivity gains for modern enterprises by reducing costs and complexity.
Here, impact in your role matters more than title and trust is built on results. We believe in transparency and value constructive, honest debate—we’re focused on getting to the best ideas, faster. We build high-performing teams that can make an impact quickly and with high quality. To do this, we are building a culture of execution centered on customer obsession, collaboration, ownership and accountability.
We champion an “AI Forward, People First” philosophy to help us accelerate and innovate, empowering our people to embrace their potential. If you’re driven by purpose, thrive on solving complex challenges and want to make a positive difference on a global scale, we invite you to bring your talents to Zscaler to help shape the future of cybersecurity.
Role
We are looking for a Staff Escalation DevOps Engineer to join our Shared Platform Services team. This is a hybrid role based in Bangalore, reporting to the Sr. Manager Software Engineering. Our Engineering team built the world's largest cloud security platform from the ground up, and we continue to scale. With more than 100 patents and extensive plans for enhancing services, our multitenant architecture has established us as the cloud security leader for over 15 million users worldwide. We invite you to bring your vision and passion to our team of architects and engineers who are enabling global organizations to harness speed and agility with a cloud-first strategy.
What you’ll do (Role Expectations)
- Troubleshooting, debugging, and resolving customer or production cloud incidents that have been escalated by the support team
- Taking ownership of escalated problems, performing impact analysis, implementing solutions, and communicating with affected parties
- Working with development, security, and operations teams to find and implement fixes for complex system problems
- Monitoring PagerDuty alerts for system health, performance, and security while augmenting alerts to meet SLOs
- Creating diagnostic tools, dashboards, and documentation to help other engineers resolve issues more effectively
Who You Are (Success Profile)
- You thrive in ambiguity and are comfortable building the path as you walk it, seeing dynamic environments as the raw material to build something meaningful.
- You act like an owner with a passion for the mission and a bias for action, navigating seamlessly between high-level strategy and hands-on execution.
- You are a problem-solver who is energized by finding solutions to the hardest challenges, knowing that solving them delivers the biggest impact.
- You are a learner with a growth mindset who actively seeks feedback to become a better partner and a stronger teammate.
- You are driven by innovation and have a deep curiosity for how things work, always seeking a better, more secure, and scalable way to accelerate transformation.
What We’re Looking for (Minimum Qualifications)
- Strong skills in scripting with Python or Bash, Java programming, cloud platforms like GCP/AWS/Azure, and configuration management tools such as Terraform or Ansible
- 5+ years of experience triaging and solving critical customer-facing production issues with live troubleshooting and debugging
- Experience deploying production releases in Google Cloud (GCP) and data center environments using CI/CD pipelines
- Proficient with authentication protocols like SAML/OAuth and networking protocols like TCP/IP, UDP, and ICMP using debug tools like Postman or Packet captures
- Strong knowledge of monitoring and alerting tools like Grafana and Klodfuse to set up dashboards for essential service metrics