Site Reliability Engineer

Confirmed live in the last 24 hours

Alloy

New York City

Hybrid

Posted April 6, 2026

Job Description

Alloy is where you belong!

Alloy helps solve the identity risk problem for companies that offer financial products by enabling them to outpace fraud and confidently serve more people around the world. Over 800 of the world’s largest financial institutions and fintechs turn to Alloy to take control of fraud, credit, and compliance risk, and grow with the clearest picture of their customers.

Through our values: Be Bold, Get Scrappy, Collaborate, and Celebrate Our Differences, we are creating a workplace where you can grow, thrive, and belong. See how we’ve been continuously recognized and named one of Inc. Magazine’s Best Workplaces, Forbes America’s Best Startup Employers, Best Fintech to Work for by American Banker, year after year.

Check out our investors and read more about us here.

About the team

Alloy’s Infrastructure Team is a small team (6 engineers) responsible for a large and growing infrastructure footprint: 15+ Kubernetes clusters, 100+ databases, dozens of services, and complex data organization.

Our challenge isn’t just scale—it’s making that scale reliable, secure, and operable with less manual work.

We’re looking for engineers who enjoy turning complex, fragile systems into automated, self-service platforms with strong safety guarantees.

What you'll be doing

Reporting to the Engineering Manager of Infrastructure, you'll:

Design and build systems to automate infrastructure management at scale (provisioning, upgrades, migrations)
Reduce operational toil by turning manual processes into reliable, repeatable workflows
Build internal tooling and platforms that enable safe self-service changes for other engineers
Improve the reliability and resilience of our infrastructure (Kubernetes, databases, services)
Implement and evolve systems for deploying and running applications in Kubernetes
Contribute to architecture decisions across infrastructure, reliability, and security
Write and review production-quality code
Participate in on-call rotations—but focus on building systems that prevent incidents, not just respond to them

Who we’re looking for

5+ years of experience in infrastructure, SRE, or software engineering roles
Strong software engineering skills—you build systems, not just scripts
Experience managing production infrastructure at scale (cloud + containerized systems)
Experience with Infrastructure as Code (e.g., Terraform)
Experience running and troubleshooting distributed systems (Docker/Kubernetes)
Experience with observability and debugging tools (Datadog, CloudWatch, ELK/EFK, etc.)
Proficiency in at least one programming language (Python, Go, JavaScript, etc.)
Experience participating in on-call rotations and improving systems based on incidents
Strong communication and collaboration skills

You might be a great fit if you