Role Description
As a Corporate Site Reliability Engineer (SRE) at Dropbox, you will help lead the infrastructure strategy and technical direction of one of the most innovative technology companies globally. Successful candidates will possess a growth mindset, strong accountability and be passionate about designing, building, and securing scalable infrastructure services in a dynamic environment. You will drive improvement projects in automation and observability and effectively handle incidents that arise in a prompt but measured way. In this role, you'll serve as a technical lead of programs related to monitoring, metrics, alerting and reliability throughout the IT Services organization, and contribute to the evolution of our world-class infrastructure while ensuring utmost security and scalability.
Responsibilities
- Ensure the reliability, scalability, and performance of Dropbox's infrastructure and services
- Collaborate with cross-functional teams to develop and maintain best practices for monitoring, logging, and incident response
- Build, Implement and maintain automations & infrastructure-as-code tooling, specifically Terraform, Ansible, and Github Actions as well as custom code platforms
- Utilize container orchestration platforms, such as Kubernetes, Amazon ECS and Red Hat Openshift, to manage containers at scale
- Manage and optimize monitoring and logging pipelines using tools like Datadog and Cribl LogStream
- Drive improvement projects related to service health and visibility for our stakeholders, ranging from developers to business service owners to C-level