Senior Site Reliability Engineer (Cloud Platform)

Confirmed live in the last 24 hours

Iterable

Hybrid - Lisbon, Portugal

Hybrid

Posted March 12, 2026

Job Description

Iterable is the leading AI-powered customer engagement platform that helps leading brands like Redfin, SeatGeek, Priceline, Calm, and Box create dynamic, individualized experiences at scale. Our platform empowers organizations to activate customer data, design seamless cross-channel interactions, and optimize engagement—all with enterprise-grade security and compliance. Today, nearly 1,200 brands across 50+ countries rely on Iterable to drive growth, deepen customer relationships, and deliver joyful customer experiences.

Our success is powered by extraordinary people who bring our core values—Trust, Growth Mindset, Balance, and Humility—to life. We foster a culture of innovation, collaboration, and inclusion, where ideas are valued and individuals are empowered to do their best work. That’s why we’ve been recognized as one of Inc’s Best Workplaces and Fastest Growing Companies, and were recognized on Forbes’ list of America’s Best Startup Employers in 2022. Notably, Iterable has also been listed on Wealthfront’s Career Launching Companies List and has held a top 10 ranking on the Top 25 Companies Where Women Want to Work.

With a global presence—including offices in San Francisco, New York, Denver, London, and Lisbon, plus remote employees worldwide—we are committed to building a diverse and inclusive workplace. We welcome candidates from all backgrounds and encourage you to apply. Learn more about our story and mission on our Culture and About Us pages. Let’s shape the future of customer engagement together!

How you will make an impact:

As a Senior Engineer on the Cloud Platform team, your impact will be measured by the continuous improvement of our platform’s reliability, scalability, and security posture.

SLO Ownership & Error Budget Management: Take direct ownership of the established Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for core platform services (e.g., latency, availability, error rate). You will manage and use the Error Budget as the primary drivers to prioritize reliability work
Scale and HArden the Core Platform: Apply deep technical expertise in Kubernetes, AWS, traffic management, and Infrastructure-as-Code to scale and harden the foundational platform that powers Iterable’s product workloads.
Drive Systemic Improvements: This role centers on hands-on engineering skill, technical leadership, and systemic reliability improvements within our complex, distributed multi-region platform.

What you’ll do

Kubernetes Platform Engineering
Use your Kubernetes and AWS expertise to evolve EKS lifecycle, multi-tenant isolation, and regional consistency, ensuring clusters remain secure, performant, and predictable as we scale.
Traffic & Ingress Reliability
Apply advanced knowledge of cloud-native traffic management, and API gateways to strengthen routing, authentication, rate-limiting, and secure communication protocols (like mTLS). This focus will dramatically improve both the reliability and security posture of the platform’s public and internal service access points.
Infrastructure-as-Code at Scale
Demonstrate mastery in IaC to manage complex, multi-region architecture. Use tools like Terraform Cloud to build reusable modules, validate changes through policy-as-code, and establish safe multi-account patterns our teams can rely on.
Security & Access Control
Drive a zero-trust posture by establishing service guardrails and access controls across the platform: This includes: implementing policy-a