Senior Site Reliability Engineer, Edge - TS/SCI
Confirmed live in the last 24 hours
Okta
Compensation
$159,000 - $218,900/year
Job Description
Secure Every Identity, from AI to Human
Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.
This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.
About the Team
At Okta, our motto is "Always On." Within the Technical Operations (TechOps) team, we live this mission by building the most reliable and performant systems on the planet. We empower organizations to do their most significant work by securely connecting any person, on any device, to the technologies they need.
The Role
We are seeking a Senior Site Reliability Engineer (SRE) to lead the evolution of our large-scale production systems. This role is designed for a technical expert who thrives on solving complex problems at scale and lives by the ethic: "If you have to do it twice, automate it." Based in the Washington, D.C. area, you will ensure our infrastructure maintains uncompromising reliability and performance while supporting critical national security missions in secure, restricted environments.
Security Requirement: Must be able to obtain and maintain a U.S. security clearance (Secret or Top Secret) to the extent required by U.S. Government contracts.
The selected candidate may be subject to drug testing to the extent required by U.S. Government contracts.
What You’ll Do
- Infrastructure Leadership: Design, build, and oversee Okta’s production infrastructure, ensuring architectural integrity and peak performance.
- Incident Engineering: Act as a senior escalation point for production incidents, conducting deep-dive root cause analysis and implementing permanent, automated preventive solutions.
- Strategic Automation: Eliminate manual toil by developing sophisticated automation frameworks, evolving monitoring tools, and establishing rigorous technical documentation.
- System Resilience: Optimize a highly available, large-scale environment, ensuring "Always On" service delivery across complex network topologies.
- Mentorship: Provide technical guidance to the engineering organization, championing SRE best practices and a culture of self-education.
What You’ll Bring
Core Requirements
- Clearance: Active TS/SCI with Polygraph.
- Compliance Expertise: Deep professional experience with FedRAMP and DoD IL6 frameworks.
- Education: B.S. in Computer Science or equivalent technical experience.
Technical Expertise
- Networking & Cloud Architecture: Mastery of AWS networking and security, including Transit Gateways, VPCs, Route Tables, ELBs, and NACLS.
- Infrastructure as Code (IaC): Advanced experience automating enterprise-scale infrastructure via Terraform or CloudFormation.
- Systems & Scripting: Expert-level Linux systems administration with proficiency in Go, Python, Bash, or Ruby.
- Production Support: Proven success managing Docker containers and Java-based stacks (Apache/Tomcat) in high-security production environments.
Protocol Knowledge: Solid understanding of networking concepts, IP protocols, and multi-cloud infrastructure.
Similar Jobs
Okta
Staff Site Reliability Engineer, Kubernetes w/ active TS/SCI
Axon
Site Reliability Engineer II
Axon
Sr. Site Reliability Engineer I
New Era Technology
Site Reliability Engineer (SRE)
Axle Informatics
Site Reliability Engineer
PlayStation