Senior Site Reliability Engineer

Confirmed live in the last 24 hours

Attentive

United States

On-site

Posted April 3, 2026

Job Description

Attentive® is the AI marketing platform for 1:1 personalization redefining the way brands and people connect. We’re the only marketing platform that combines powerful technology with human expertise to build authentic customer relationships. By unifying SMS, RCS, email, and push notifications, our AI-powered personalization engine delivers bespoke experiences that drive performance, revenue, and loyalty through real-time behavioral insights.

Recognized as the #1 provider in SMS Marketing by G2, Attentive partners with more than 8,000 customers across 70+ industries. Leading global brands like Crate and Barrel, Urban Outfitters, and Carter’s work with us to enable billions of interactions that power tens of billions in revenue for our customers.

With a distributed global workforce and employee hubs in New York City, San Francisco, London, and Sydney, Attentive’s team has been consistently recognized for its performance and culture. We’re proud to be included in Deloitte’s Fast 500 (four years running!), LinkedIn’s Top Startups, Forbes’ Cloud 100 (five years running!), Inc.’s Best Workplaces, and the Human Rights Campaign Foundation's Corporate Equality Index!

About the Role

What You’ll Accomplish

Design and deliver high-impact solutions: Design and implement systems that enhance reliability, observability, traceability, and incident management, ensuring the platform scales effectively
Lead execution on key projects: Take ownership of projects, driving them from discovery through execution
Partner across teams: Collaborate with engineers from AI/ML, Data, Platform, and Product teams to develop best-in-class platforms and services
Establish standards and best practices: Define and enforce production standards, processes, and tools to ensure operational excellence
Champion reliability goals: Advocate for and implement SLIs, SLOs, and other reliability-focused metrics across the engineering organization
Mentor and knowledge share: Guide and mentor junior team members, fostering technical growth and helping to develop the next generation of engineering leaders
Innovate and inspire: Drive continuous improvement by bringing creative ideas and challenging the status quo

Your Expertise

5+ years of experience in Production Engineering, SRE, Platform Engineering, DevOps, Backend Engineering, or similar roles
Strong coding ability in at least one language (e.g., Golang, Python, Java, Typescript) with the capability to solve complex issues through code
Experience with cloud-native technologies and Infrastructure-as-Code (e.g. Kubernetes, Terraform, AWS)
Demonstrated experience delivering medium to large-scale projects that drive meaningful improvements in platform reliability and scalability
Deep understanding of production reliability concepts, including SLIs, SLOs, and incident management
Proficient in designing and maintaining CI/CD pipelines, deployment strategies, and release automation to enable fast, safe delivery
Fluency in frontier AI-assisted development tools and agents (Claude Code, Codex, Cursor, or similar)
Excellent verbal and written communication skills with the ability to collaborate across technical and non-technical teams
Familiarity with working in dynamic, reliability-focused production environments (preferred)

What We Use

Our services run primarily in Kubernetes, hosted on AWS EKS
Our tooling includes Terraform, Helm, ArgoCD, Istio, CloudFlare, Datadog, and Incident.io
Our backend is primarily

reactpythonjavatypescriptgoawskubernetesmachine learningaifrontend