Staff Infrastructure Engineer

Confirmed live in the last 24 hours

Flex (Flextronics)

Compensation

$170,000 - $250,000/year

Remote (U.S.)

Hybrid

Posted April 2, 2026

Job Description

Flex is a growth-stage, NYC headquartered FinTech company that is creating the best rent payment experience. It’s hard to believe that it’s 2026 and paying rent on time is expensive, inflexible, and difficult. We’re here to change that! Flex enables our users to pay rent throughout the month on a schedule that better fits their finances and budget. Our mission is to empower as many renters as possible with flexibility over their most significant recurring expense. After deliberately keeping a stealth profile as we built up unprecedented investor support and an enthusiastic user base, we are looking for motivated individuals to help us keep our mission growing. Will you be a part of the team?

About the role

Flex is looking for a Staff Infrastructure Engineer to lead technical direction across our infrastructure platform, setting the strategy for how we build, deploy, and operate reliable systems at scale.

In this role, you will lead projects within the Infrastructure Engineering team, partnering with engineering leaders across the org to align infrastructure investments with business priorities and drive the standards and practices that raise the bar for the wider engineering department. You will shape how we approach reliability, developer experience, and infrastructure as code, and you will be expected to identify when current approaches are not working and redirect effort toward higher-impact outcomes.

At Flex, we are an AI-first engineering organization. We use AI-assisted tools to move faster and improve quality, while maintaining strong human ownership of correctness, security, and reliability. We’re looking for engineers who combine strong technical judgment with practical, execution-focused delivery.

We are particularly interested in candidates with software engineering experience in languages like Java, Python, or TypeScript. This background helps you collaborate effectively with product and platform teams, build internal tooling, and improve developer experience.

This remote role requires a minimum of 8 years of cloud infrastructure experience.

What you’ll do

Lead infrastructure project teams across multiple domains (reliability, developer experience, cloud platform), providing technical direction, maintaining project plans, and keeping leadership and cross-functional stakeholders informed of progress, risks, and tradeoffs.
Partner with engineering leaders and peer Staff+ engineers across the org to align infrastructure strategy, align technical investments with business goals, and provide authoritative technical scope for cross-functional initiatives.
Architect and deliver large, complex infrastructure systems, designing for scale, reliability, and operational simplicity. Drive decisions on build-vs-buy, technology selection, and migration strategy for the domains you lead.
Define and evolve Flex's infrastructure-as-code strategy, including Terraform module architecture, governance standards, and safe rollout patterns. Introduce new IaC tooling or frameworks when existing approaches no longer serve team needs, and drive adoption across engineering.
Lead strategic reliability improvements across services you work with, defining SLI/SLO frameworks with partner teams, delivering net-new ways to measure and communicate operational health and customer impact, and driving sustained reliability gains rather than one-off fixes.
Shape the developer platform strategy, identifying the highest-leverage investments in self-service tooling, CI/CD, and deployment automation. Set the quality bar for developer-facing infrastructure and ensure the team ships tooling that meaningfully accelerates engineering velocity.
Design cross-service observability architectures (metrics, logs, traces) with clear operational standards. Lead strategic alerting and runbook improvements that reduce mean-time-to-detect and mean-time-to-resolve across the org.
Drive systemic incident resilience: lead cross-team infrastructure incident response, identify recurring failure patterns, and own the follow-through that turns post-incident findings into durable infrastructure improvements. Proactively refocus team efforts when reliability projects are off-course or not delivering meaningful risk reduction.
Build engineering rigor into team processes, improving design review standards, deployment checklists, operational readiness criteria, and code quality practices. Set a high bar and coach the team to consistently meet it.
Design AI-assisted workflows for your