Staff Site Reliability Engineer, Kubernetes w/ active TS/SCI
Confirmed live in the last 24 hours
Okta
Compensation
$188,000 - $235,000/year
Job Description
Secure Every Identity, from AI to Human
Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.
This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.
About the Team
At Okta, our motto is "Always On." Within the Technical Operations (TechOps) team, we live this mission by building the most reliable and performant systems on the planet. We empower organizations to do their most significant work by securely connecting any person, on any device, to the technologies they need.
The Role
We are looking for an experienced Senior Site Reliability Engineer (SRE) who thrives on the challenge of managing large-scale cloud production systems. The ideal candidate is a self-starter who lives by the ethic: "If you have to do it twice, automate it." Based in the Washington, D.C. area, with on-site customer travel, you will ensure our infrastructure maintains uncompromising reliability and performance while supporting the most sensitive national security missions.
Security Requirement: Must be able to obtain and maintain a U.S. security clearance (Secret or Top Secret) to the extent required by U.S. Government contracts.
The selected candidate may be subject to drug testing to the extent required by U.S. Government contracts.
What You’ll Do
- Infrastructure Excellence: Design, deploy, and monitor Okta’s production infrastructure to ensure peak performance and reliability.
- Incident Management: Serve as a frontline responder to production incidents, performing deep-dive troubleshooting and implementing permanent preventive solutions.
- Aggressive Automation: Eliminate manual toil by developing automation scripts, evolving monitoring tools, and documenting technical workflows.
- Scalability: Support a highly available, large-scale environment as part of an on-call rotation, ensuring "Always On" service delivery.
What You’ll Bring
Core Requirements
- Clearance & Citizenship: Active TS/SCI clearance.
- Federal Compliance: Deep familiarity with FedRAMP and DoD IL6 compliance standards.
- Education: B.S. in Computer Science or equivalent professional experience.
Technical Expertise
- Kubernetes Mastery: 5+ years of experience building and operating workloads orchestrated by Kubernetes, including expert-level debugging of Helm values and charts.
- Systems & Scripting: Strong Linux systems administration background with proficiency in Go, Python, Bash, or Ruby.
- Cloud Infrastructure: Expertise in AWS services (EC2, ECS, KMS, CloudWatch) and Infrastructure as Code (Terraform or CloudFormation).
- Production Support: Experience managing Docker containers and web applications (Java/Apache/Tomcat) in high-traffic live environments.
Networking: Solid understanding of networking concepts and IP protocols; experience with multi-cloud environments is a significant plus.
#LI-TM
#LI-Hybrid
P24510
Similar Jobs
Okta
Senior Site Reliability Engineer, Edge - TS/SCI
Axon
Site Reliability Engineer II
Axon
Sr. Site Reliability Engineer I
New Era Technology
Site Reliability Engineer (SRE)
Axle Informatics
Site Reliability Engineer
PlayStation