Staff Site Reliability Engineer, Cloud

Confirmed live in the last 24 hours

Kentik

Compensation

$165,000 - $200,000/year

Remote – United States

Remote

Posted February 13, 2026

Job Description

Who we are

Kentik is the network intelligence platform for modern infrastructure teams. Unlike traditional monitoring and observability tools, we demystify complex network operations, enabling organizations to deliver applications and innovation at scale. Built by network experts to make critical insight accessible to every engineer, Kentik is the real-time source of truth that understands every network in context — from data center to cloud to the internet. This single platform unifies and correlates cloud, device, flow, and synthetic data to turn telemetry into action. Market leaders like Akamai, Booking.com, Dropbox, and Zoom rely on Kentik to run, manage, and optimize their networks.

What we do

Our platform ingests trillions of records and serves hundreds of thousands of queries for our users each day. You will gain experience building a production quality, high performance server-and-client SaaS application that handles uniquely high volumes of data.

We have built a team of world-class engineers, network experts, and technology thought leaders in a remote-friendly culture from day one. While prior experience in a remote environment is not required, we highly value strong collaboration and communication skills, as well as a high level of independence and autonomy.

What you'll do

Kentik is looking for a Staff level Site Reliability Engineer (Cloud) to join our Product Engineering team to help build and maintain our Synthetics and Cloud product lines. These products have multiple applications deployed in various cloud providers all over the world. We manage these cloud applications using observability tooling, automated build processes, and adherence to configuration as code best practices.

We’re looking for an experienced engineer who will work with engineering teams across the company to help grow our hardware and software infrastructure. We operate a well-organized, well-instrumented platform, and offer enormous opportunities for employee growth.

Make sure our real-time, scalable, infrastructure is set up for growth and working efficiently. Our infrastructure runs on our own hardware, across multiple locations as well as all major cloud vendors
Work on tools and processes to better monitor our platform as well as ensuring its stability through our rapid growth
Deep-diving into diverse topics, from firewalls and IP routing, to database replication strategies or automating build processes
Collaborate with engineering and infrastructure teams on finding solutions from an operational perspective
Assist with expanding our cloud deployments across the major cloud providers
Contribute code, code reviews and tools or patches to all kinds of existing code
Write design documents or collaborate on colleagues’ docs to introduce new features or changes into our infrastructure
Provide valuable feedback on team goals, projects, and processes. We believe in continuously improving our team

What you'll bring

Studies have shown that some candidates tend to apply to jobs only if they meet 100% of the qualifications. We encourage you to apply if you meet most of the criteria - even if you don’t match all of the qualifications, your skills and experience could be valuable in this role!

8+ years of experience in cloud-based Systems Administration, IT and/or SRE related projects
Expertise in public cloud environments such as AWS, GCP, Azure, or OCI.
Strong command of containerization and orchestration using Docker and Kubernetes.
Solid programming and automation skills using Bash, Python, or Go.
Proficiency with Infrastructure as Code (IaC) and configuration management platforms such as Terraform, Ansible, and Puppet.
Proficiency in Linux administration and command-line tools (e.g., SSH, grep, awk).
Detailed understanding of major internet protocols (TCP/IP, DNS, HTTP, TLS)
Networking administration experience: concepts such as routing, firewalls (iptables), peering sound familiar
A passion for documenting code, processes, and infrastructure in runbooks and wikis
Worked with metrics monitoring solutions such as grafana, prometh