Back to Search
Overview
Senior

Senior Site Reliability Engineer

Confirmed live in the last 24 hours

Attentive

Attentive

United States
On-site
Posted April 3, 2026

Job Description

Attentive® is the AI marketing platform for 1:1 personalization redefining the way brands and people connect. We’re the only marketing platform that combines powerful technology with human expertise to build authentic customer relationships. By unifying SMS, RCS, email, and push notifications, our AI-powered personalization engine delivers bespoke experiences that drive performance, revenue, and loyalty through real-time behavioral insights.
 
Recognized as the #1 provider in SMS Marketing by G2, Attentive partners with more than 8,000 customers across 70+ industries. Leading global brands like Crate and Barrel, Urban Outfitters, and Carter’s work with us to enable billions of interactions that power tens of billions in revenue for our customers.
 
With a distributed global workforce and employee hubs in New York City, San Francisco, London, and Sydney, Attentive’s team has been consistently recognized for its performance and culture. We’re proud to be included in Deloitte’s Fast 500 (four years running!), LinkedIn’s Top StartupsForbes’ Cloud 100 (five years running!), Inc.’s Best Workplaces, and the Human Rights Campaign Foundation's Corporate Equality Index!

About the Role

What You’ll Accomplish

  • Design and deliver high-impact solutions: Design and implement systems that enhance reliability, observability, traceability, and incident management, ensuring the platform scales effectively
  • Lead execution on key projects: Take ownership of projects, driving them from discovery through execution
  • Partner across teams: Collaborate with engineers from AI/ML, Data, Platform, and Product teams to develop best-in-class platforms and services
  • Establish standards and best practices: Define and enforce production standards, processes, and tools to ensure operational excellence
  • Champion reliability goals: Advocate for and implement SLIs, SLOs, and other reliability-focused metrics across the engineering organization
  • Mentor and knowledge share: Guide and mentor junior team members, fostering technical growth and helping to develop the next generation of engineering leaders
  • Innovate and inspire: Drive continuous improvement by bringing creative ideas and challenging the status quo

Your Expertise

  • 5+ years of experience in Production Engineering, SRE, Platform Engineering, DevOps, Backend Engineering, or similar roles
  • Strong coding ability in at least one language (e.g., Golang, Python, Java, Typescript) with the capability to solve complex issues through code
  • Experience with cloud-native technologies and Infrastructure-as-Code (e.g. Kubernetes, Terraform, AWS)
  • Demonstrated experience delivering medium to large-scale projects that drive meaningful improvements in platform reliability and scalability
  • Deep understanding of production reliability concepts, including SLIs, SLOs, and incident management
  • Proficient in designing and maintaining CI/CD pipelines, deployment strategies, and release automation to enable fast, safe delivery
  • Fluency in frontier AI-assisted development tools and agents (Claude Code, Codex, Cursor, or similar)
  • Excellent verbal and written communication skills with the ability to collaborate across technical and non-technical teams
  • Familiarity with working in dynamic, reliability-focused production environments (preferred)

What We Use

  • Our services run primarily in Kubernetes, hosted on AWS EKS
  • Our tooling includes Terraform, Helm, ArgoCD, Istio, CloudFlare, Datadog, and Incident.io
  • Our backend is primarily
reactpythonjavatypescriptgoawskubernetesmachine learningaifrontend