Staff Site Reliability Engineer
Confirmed live in the last 24 hours
Blink Health
Job Description
Company Overview:
Blink Health is the fastest growing healthcare technology company that builds products to make prescriptions accessible and affordable to everybody. Our two primary products – BlinkRx and Quick Save – remove traditional roadblocks within the current prescription supply chain, resulting in better access to critical medications and improved health outcomes for patients.
BlinkRx is the world’s first pharma-to-patient cloud that offers a digital concierge service for patients who are prescribed branded medications. Patients benefit from transparent low prices, free home delivery, and world-class support on this first-of-its-kind centralized platform. With BlinkRx, never again will a patient show up at the pharmacy only to discover that they can’t afford their medication, their doctor needs to fill out a form for them, or the pharmacy doesn’t have the medication in stock.
We are a highly collaborative team of builders and operators who invent new ways of working in an industry that historically has resisted innovation. Join us!
Responsibilities
- Establish and evolve SRE best practices across the organization, including reliability principles, error budgets, incident response, postmortems, and operational readiness standards.
- Define and drive observability strategy for system health, performance, and reliability, including SLIs/SLOs, alerting quality, dashboards, and service health indicators.
- Design and implement software-driven solutions within the infrastructure domain, automating manual processes and eliminating operational complexity and toil.
- Act as a technical leader and force multiplier, helping set priorities and influencing decision-making across core cloud infrastructure, reliability tooling, and platform architecture.
- Take ownership of large, ambiguous initiatives, driving them from concept to delivery while aligning stakeholders across engineering, security, and product.
- Combine deep knowledge of software development, infrastructure, and security to improve platform resilience, scalability, performance, and compliance.
- Proactively identify systemic risks and reliability gaps, recommending and leading platform upgrades and architectural improvements before they become incidents.
- Partner with engineering teams to improve developer workflows, tooling, and operational maturity, increasing productivity while reducing cognitive load.
- Provide technical mentorship, architecture guidance, and high-quality design and code reviews for engineers across infrastructure and product teams.
- Lead by example in documentation and knowledge sharing, ensuring systems and processes are well-understood and not dependent on individual ownership.
- Participate in and help mature incident response, escalation practices, and post-incident learning across the organization.
Desired Experience
- Bachelor’s or Master’s degree in Computer Science or equivalent practical experience.
- 7+ years of experience in site reliability engineering, infrastructure engineering, or platform engineering roles, with demonstrated impact at scale.
Reliability & Troubleshooting
- Expert-level, methodical troubleshooting across the entire stack, from application to kernel to network.
- Strong command-line proficiency and deep expertise in Linux systems and operating system fundamentals.
- Advanced understanding of networking concepts including load balancing, proxies, DNS, TCP/IP, NAT, and service-to-service communicati