Back to Search
Overview
Lead / Manager

Senior Manager, Site Reliability Engineering

Confirmed live in the last 24 hours

Ping Identity

Ping Identity

USA - Remote
Remote
Posted April 7, 2026

Job Description

About Ping Identity: 

At Ping Identity, we believe in making digital experiences both secure and seamless for all users, without compromise. We call this digital freedom. And it's not just something we provide our customers. It's something that inspires our company. People don't come here to join a culture that's built on digital freedom. They come to cultivate it. 

Our intelligent, cloud identity platform lets people shop, work, bank, and interact wherever and however they want. Without friction. Without fear. 

While protecting digital identities is at the core of our technology, protecting individual identities is at the core of our culture. We champion every identity. One of our core values, Respect Individuality, reminds us to celebrate differences so you are empowered to bring your authentic self to work. 

We're headquartered in Denver, Colorado and we have offices and employees around the globe. We serve the largest, most demanding enterprises worldwide, including more than half of the Fortune 100. At Ping Identity, we're changing the way people and businesses think about cybersecurity, digital experiences, and identity and access management. 

As a Senior Manager of SRE at Ping Identity, you will be involved in every facet of our on-demand SaaS services and will build, deploy, and maintain the infrastructure of one of the largest identity platforms in the world. We follow a DevOps model: our teams are integrated with development teams, and running continuous deployments daily, and SREs are expected to provide input in the product's design, development, deployment, and operations.

Working within the Cloud Services team, you'll manage a team of SREs that build automated infrastructure and deployments. You'll be the expert on operational excellence and how systems can be built to be redundant, scalable, and observable.

Responsibilities:

  • Provide leadership and mentorship to a team of 8-10 Site Reliability Engineers (SREs).
  • Possess expertise in defining, measuring, and reporting on key Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure adherence to the 99.99%+ uptime Service Level Agreement (SLA).
  • Collaborate effectively with other SRE, Security, and Development teams.
  • Define and implement processes to ensure the team efficiently meets target deadlines.
  • Drive the successful completion of large-scale projects, coordinating with multiple Development teams.
  • Conduct thorough capacity analysis and planning.
  • Effectively manage and scale infrastructure by establishing and adhering to automation standards.
  • Analyze and resolve complex system behavior, performance, and application issues.
  • Oversee comprehensive observability and analysis across multiple datacenters.

Requirements:

  • Minimum of six years of experience leading a software-focused Site Reliability Engineering (SRE) team of eight to ten staff.
  • Demonstrated experience working within organizations operating on a global scale.
  • Proven ability to drive strategic decisions regarding "build vs. buy" technology choices.
  • Proficiency in developing, maintaining, and administering modern infrastructure tooling, with a strong emphasis on Infrastructure as Code (IaC) principles.
  • Experience provisioning public cloud resources utilizing IaC tools such as CloudFormation and Terraform.
  • Solid knowledge of scripting and programming standards (e.g., Python, Ruby, Bash, Go).
  • Experience with Docker and container orchestration platforms (e.g., Kubernetes).
  • Practical experience using Git in a large-scale team environment.
  • Understanding and application of security design principles.
  • Experience operating within a high-volume or mission-critical production service environment.
  • Expertise in IP networking, including familiarity with network functionality, operational procedures, and failure modes.

Nice to Have:

  • Familiarity with observability tooling such as NewRelic, Splunk, Grafana, and Cloudwatch.
  • Famili
pythongokubernetesdockeraidevopsdataproductdesign