Software Development Engineer, ElastiCache

Confirmed live in the last 24 hours

Amazon Development Center U.S., Inc.

Seattle, WA, USA

On-site

Posted April 11, 2026

Job Description

This is an opportunity to join one of AWS's most foundational and high-impact engineering teams — the In-Memory Computing Platform team, part of Amazon ElastiCache. We build the next-generation, high-performance in-memory distributed data storage platform that powers some of the world's most demanding real-time applications. Our work sits at the intersection of distributed systems, database internals, and cloud-scale infrastructure, and it directly shapes how millions of AWS customers build low-latency, high-throughput applications.

If you've ever found yourself deep in a conversation about CAP theorem, consistent hashing, Paxos, or gossip protocols — and you want to apply those ideas to real-world systems at massive scale — this team is where you belong. We are the engineers behind the acclaimed Amazon Dynamo paper, and we continue to push the boundaries of what NoSQL systems can do. We're not just building a cache; we're building a durable, highly available, and scalable in-memory database platform that bridges the best of RDBMS and NoSQL worlds.

Key job responsibilities
As a Software Development Engineer on this team, you will take on broad ownership across the full lifecycle of our platform. Your core responsibilities will include:

- Designing and building the next-generation in-memory NoSQL database platform, enabling developers to create highly available, scalable, and high-performance applications at unprecedented scale.
- Leading software development of large-scale distributed in-memory storage systems, primarily in Java and C/C++, leveraging open-source technologies such as Redis and Memcached alongside Amazon-proprietary technologies.
- Developing and operating HTTP/REST services, asynchronous messaging systems, and event-driven architectures that form the backbone of our platform.
- Building and improving real-time failure detection and auto-remediation systems capable of detecting node failures in large distributed clusters and initiating recovery within seconds.
- Driving horizontal and vertical scaling capabilities, management and monitoring plane workflows, fault tolerance mechanisms, and backup and restore technologies.
- Contributing to disaster recovery and prevention strategies to ensure the highest levels of availability and durability for our customers.
- Mentoring and growing junior engineers on the team, serving as a technical leader and role model for engineering best practices.
- Managing individual project priorities, deadlines, and deliverables with a high degree of autonomy and accountability.

A day in the life
Day-to-day, you can expect a dynamic mix of deep technical work and collaborative engineering. A typical week might look like:

Writing and reviewing production-quality code in Java or C/C++ for distributed storage components, scaling systems for monitoring plane owned services.
Participating in design reviews and architecture discussions, where you'll debate tradeoffs around consistency, availability, and partition tolerance — and then go build the solution.

Collaborating with peer engineers across the team to debug complex distributed systems issues, analyze failure patterns, and drive root cause analysis.
Working closely with the monitoring plane and operations team to improve observability, tune auto-remediation workflows, and reduce mean time to recovery.

Engaging with product and customer service teams to understand real-world use cases — from IoT and mobile applications to large-scale analytics — and translating those needs into platform capabilities.
Mentoring junior engineers through code reviews, design feedback, and pairing sessions, helping them grow their technical skills and Amazon engineering judgment.

Contributing to the team's operational excellence by participating in on-call rotations and driving improvements to system reliability and operational tooling.

About the team
The In-Memory Storage Platform team is a passionate group of engineers who thrive on solving hard distributed systems problems. We are a collaborative, intellectually curious team that values technical depth, ownership, and a bias for action. Our charter is Amazon ElastiCache — an AWS service that enables customers to deploy, manage, and massively scale in-memory distributed data stores using open-source technologies like Redis and Memcached.

Our customers include some of the world's fastest-growing startups and enterprises, all relying on ElastiCache to build ultra-low-latency, high-throughput data layers. We are deeply invested in open-source software and actively contribute to the broader NoSQL ecosystem. As a team, we believe in growing together — senior engineers are directly involved in mentoring and developing junior engineers, and we take pride in building a culture of technical excellence and continuous learning.

About AWS

nodejavagorustawsaimobiledataanalyticsproduct