Site Reliability Engineer
Confirmed live in the last 24 hours
Alloy
Job Description
Alloy is where you belong!
Alloy helps solve the identity risk problem for companies that offer financial products by enabling them to outpace fraud and confidently serve more people around the world. Over 800 of the world’s largest financial institutions and fintechs turn to Alloy to take control of fraud, credit, and compliance risk, and grow with the clearest picture of their customers.
Through our values: Be Bold, Get Scrappy, Collaborate, and Celebrate Our Differences, we are creating a workplace where you can grow, thrive, and belong. See how we’ve been continuously recognized and named one of Inc. Magazine’s Best Workplaces, Forbes America’s Best Startup Employers, Best Fintech to Work for by American Banker, year after year.
Check out our investors and read more about us here.
About the team
Alloy’s Infrastructure Team is a small team (6 engineers) responsible for a large and growing infrastructure footprint: 15+ Kubernetes clusters, 100+ databases, dozens of services, and complex data organization.
Our challenge isn’t just scale—it’s making that scale reliable, secure, and operable with less manual work.
We’re looking for engineers who enjoy turning complex, fragile systems into automated, self-service platforms with strong safety guarantees.
What you'll be doing
Reporting to the Engineering Manager of Infrastructure, you'll:
- Design and build systems to automate infrastructure management at scale (provisioning, upgrades, migrations)
- Reduce operational toil by turning manual processes into reliable, repeatable workflows
- Build internal tooling and platforms that enable safe self-service changes for other engineers
- Improve the reliability and resilience of our infrastructure (Kubernetes, databases, services)
- Implement and evolve systems for deploying and running applications in Kubernetes
- Contribute to architecture decisions across infrastructure, reliability, and security
- Write and review production-quality code
- Participate in on-call rotations—but focus on building systems that prevent incidents, not just respond to them
Who we’re looking for
- 5+ years of experience in infrastructure, SRE, or software engineering roles
- Strong software engineering skills—you build systems, not just scripts
- Experience managing production infrastructure at scale (cloud + containerized systems)
- Experience with Infrastructure as Code (e.g., Terraform)
- Experience running and troubleshooting distributed systems (Docker/Kubernetes)
- Experience with observability and debugging tools (Datadog, CloudWatch, ELK/EFK, etc.)
- Proficiency in at least one programming language (Python, Go, JavaScript, etc.)
- Experience participating in on-call rotations and improving systems based on incidents
- Strong communication and collaboration skills
You might be a great fit if you
- Default to automation over manual processes
- See repetitive work and immediately want to eliminate it
- Think in terms of
Similar Jobs
Bugcrowd
Staff Site Reliability Engineer
Bugcrowd
Senior Site Reliability Engineer
Duetto
Senior Site Reliability Engineer
Fireblocks
Site Reliability Engineer
Fireblocks
Site Reliability Engineer (SRE) (Pacific time)
Auros Global