Site Reliability Engineer II - LATAM
Confirmed live in the last 24 hours
Backblaze
Job Description
About Backblaze
Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets, unburden administrators, and unleash innovators. Together with our partners, we’re helping customers break free from the restrictive, overpriced legacy solutions that hold them back, and blaze forward with the full power of the open cloud in their hands.
Founded in 2007, we scaled the business with less than $3 million in outside funding until 2021, when we did a traditional IPO on the Nasdaq stock exchange. Today, Backblaze generates over $100m in revenue and is the leading specialized storage cloud - managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries, including businesses, developers, IT professionals, and individuals.
About the Role
We are seeking a Site Reliability Engineer II (SRE II) to help ensure the stability, scalability, and reliability of our services and infrastructure. This role focuses on building automation, maintaining observability, and supporting incident response to keep customer-facing systems performing at their best. The SRE will collaborate with engineering, product, and operations teams to embed reliability practices into day-to-day development and operations while contributing to tools and processes that improve efficiency and reduce manual effort.
Key Responsibilities
Service Reliability & Operations
- Support the availability and durability of critical services across production environments.
- Monitor service health using SLIs, SLOs, and error budgets, and escalate issues when thresholds are at risk.
- Participate in on-call rotations, incident response, and post-incident reviews to drive service improvements.
- Follow established ITIL/OSS processes (incident, change, problem, and capacity management).
Automation & Tooling
- Develop automation for common operational tasks, reducing manual intervention and toil.
- Contribute to monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, Catchpoint,ELK).
- Work with CI/CD pipelines, configuration management, and infrastructure as code tools (Terraform, Ansible, Jenkins).
- Write scripts (Bash, Python, Go, etc.) to improve system reliability and efficiency.
Similar Jobs
MongoDB
Site Reliability Engineer (Senior or Staff), Atlas
Air Apps
Site Reliability Engineer (SRE)
Air Apps
Site Reliability Engineer (SRE)
UiPath
Principal Site Reliability Engineer
Backblaze
Sr. Site Reliability Engineer
Harvey AI