Back to Search
Overview
Mid-Level

Site Reliability Engineer

Confirmed live in the last 24 hours

Alloy

Alloy

New York City
Hybrid
Posted April 6, 2026

Job Description

Alloy is where you belong!

Alloy helps solve the identity risk problem for companies that offer financial products by enabling them to outpace fraud and confidently serve more people around the world. Over 800 of the world’s largest financial institutions and fintechs turn to Alloy to take control of fraud, credit, and compliance risk, and grow with the clearest picture of their customers.

Through our values: Be Bold, Get Scrappy, Collaborate, and Celebrate Our Differences, we are creating a workplace where you can grow, thrive, and belong. See how we’ve been continuously recognized and named one of Inc. Magazine’s Best WorkplacesForbes America’s Best Startup EmployersBest Fintech to Work for by American Banker, year after year.

Check out our investors and read more about us here.

About the team

Alloy’s Infrastructure Team is a small team (6 engineers) responsible for a large and growing infrastructure footprint: 15+ Kubernetes clusters, 100+ databases, dozens of services, and complex data organization.

Our challenge isn’t just scale—it’s making that scale reliable, secure, and operable with less manual work.

We’re looking for engineers who enjoy turning complex, fragile systems into automated, self-service platforms with strong safety guarantees.

What you'll be doing

Reporting to the Engineering Manager of Infrastructure, you'll:

  • Design and build systems to automate infrastructure management at scale (provisioning, upgrades, migrations)
  • Reduce operational toil by turning manual processes into reliable, repeatable workflows
  • Build internal tooling and platforms that enable safe self-service changes for other engineers
  • Improve the reliability and resilience of our infrastructure (Kubernetes, databases, services)
  • Implement and evolve systems for deploying and running applications in Kubernetes
  • Contribute to architecture decisions across infrastructure, reliability, and security
  • Write and review production-quality code
  • Participate in on-call rotations—but focus on building systems that prevent incidents, not just respond to them

Who we’re looking for

  • 5+ years of experience in infrastructure, SRE, or software engineering roles
  • Strong software engineering skills—you build systems, not just scripts
  • Experience managing production infrastructure at scale (cloud + containerized systems)
  • Experience with Infrastructure as Code (e.g., Terraform)
  • Experience running and troubleshooting distributed systems (Docker/Kubernetes)
  • Experience with observability and debugging tools (Datadog, CloudWatch, ELK/EFK, etc.)
  • Proficiency in at least one programming language (Python, Go, JavaScript, etc.)
  • Experience participating in on-call rotations and improving systems based on incidents
  • Strong communication and collaboration skills

You might be a great fit if you

  • Default to automation over manual processes
  • See repetitive work and immediately want to eliminate it
  • Think in terms of
pythonjavajavascriptgoawskubernetesdockeraidataproduct