Data Reliability Engineer II
Confirmed live in the last 24 hours
Groupon
Job Description
Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis.
Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact.
About the Role
The Data Reliability Engineering (PRE) team at Groupon ensures all organizational data powering financial reporting, revenue management, HR, marketing, and finance operates within defined service levels. The team has evolved into a software-first reliability engineering function managing hundreds of data pipelines across multiple verticals.
This SDE II position is a full-time role replacing a contract position, reflecting the team’s need for deep business understanding, direct stakeholder relationships, and production accountability. You will work alongside engineers in Bangalore, collaborating with a globally distributed team across the US, Brazil, and Europe.
What You'll Do
Incident Response & Reliability Operations
- Own on-call rotations - triage and investigate pipeline failures across Google Cloud Composer (Airflow), Keboola, N8N, third-party-vendor Jobs with end-to-end accountability; perform deep root cause analysis to identify permanent fixes.
- Maintain SLO/SLA compliance across Tier 1, Tier 2 data assets; author blameless postmortems, incident runbooks, and handover documentation.
Monitoring & Automation
- Build automation tools and own internal products (Claude/Google ADK Agent, Prometheus query-exporter) as a software engineer; integrate automated data quality checks (schema validation, null-count, freshness) into Github Actions CI/CD pipelines.
- Manage and optimize ETL/ELT workflows (SQL/Python) across revenue, pricing, promotions, event datastreams and customer data platform; contribute to self-healing mechanisms with automated retries and dynamic scaling.
Platform & Infrastructure
- Provision, configure, and upgrade Google Cloud Composer environments (Dev/Staging/Prod); manage CI/CD governance, secrets, access permissions, and cloud security posture for production infrastructure.
- Conduct capacity planning reviews for the data warehouse, forecasting scaling needs based on utilization trends.
Observability & Stakeholder Collaboration
- Instrument SLO boards in JSM to reduce paging fatigue; drive cost-tuning and performance initiatives; support the Data Unification initiative and broader data architecture goals.
- Serve as the communication bridge between technical failures and business stakeholders (Sales, Marketing, Finance, Revenue); partner on pipeline design reviews and contribute to quarterly Reliability Reviews.
What We're Looking For
Must-Haves Skills
- Scripting Python / Linux Proficiency - production scripting with usage (not complete reliance on) of AI-generated black-box code.
- Expert SQL - complex queries, large-table transactions (50M+ rows), execution plans, and lock prevention.
- Git Fluency - branching, stashing, cherry-picking, and hotfix workflows in a fast-moving sprint environment.
- Root Cause Analysis Autonomy - trace failures across multiple systems and layers, not just restart services.
- Production System Experience - exclus
Similar Jobs
Best Buy Canada
Customer Service Representative (Part Time)
Nasdaq
Sr. Analyst Market Operations Services
Sun Life
AML Analyst
Dexcom
Associate Recruitment Marketing Manager
Dexcom
Senior Commercial Data Operations Analyst - Healthcare - UK, Ireland, Spain, Lithuania
Rolls-Royce