Sr Site Reliability Engineer
Confirmed live in the last 24 hours
The Trade Desk
Job Description
The Trade Desk is a global technology company and the world’s leading independent platform for digital advertising, with nearly 4,000 employees across more than 30 offices. Our technology helps advertisers reach the right audiences across the open internet — from streaming TV and podcasts to mobile apps, news, and more.
Advertising powers the content people love. By making it more transparent, effective, and responsible, we help support trusted journalism, quality entertainment, and creators worldwide. The world’s brands and agencies rely on us to reach their customers and grow their businesses responsibly.
The scale of our platform brings unique technical challenges — from processing massive datasets in real time to building systems that operate reliably on a global scale. When you work here, your impact is worldwide. We welcome diverse perspectives, encourage curiosity, and build teams that learn from one another. If you’re driven to solve meaningful challenges, we’d love to meet you.
About the team:
The Trade Desk Network Team owns end-to-end networking across one of the industry's most demanding infrastructures, spanning large-scale bare-metal datacenters and major public cloud platforms. We work at the intersection of network engineering and software development, embedding deeply with application, datacenter, and SRE teams to design and operate networks that power a global, high-performance ad tech platform. Our team takes a software-first approach to everything we build, and we actively integrate modern AI-assisted development tools like Cursor and Claude into our daily workflows. You will be engineering the future of network automation, not just maintaining the status quo.
Who are we looking for:
We are looking for a Senior Software Engineer who thrives at the intersection of deep networking expertise and software craftsmanship. You will partner closely with SRE and infrastructure teams to shape strategy and build the next generation of network automation, grounded in industry best practices and a bias toward scalable, maintainable solutions. You have an almost obsessive drive to keep networks healthy, performant, and resilient.
What will you be doing:
- Design, build, and scale a global network platform spanning physical datacenters and multi-cloud environments across AWS, Azure, and Alibaba Cloud.
- Support thousands of hosts worldwide, engineering reliable and efficient solutions to petabyte-scale data challenges.
- Own troubleshooting and resolution of complex network issues, upholding high availability and performance across the entire infrastructure footprint.
- Lead root cause analysis and postmortems, turning incidents into actionable improvements that raise the bar for operational excellence.
- Eliminate toil by building tools, automating workflows, and continuously improving the processes your team depends on every day.
- Share responsibility for network integrity through participation in a global, follow-the-sun on-call rotation.
What you bring to the table:
- You have 6-8 years of hands on network automation and operational experience supporting large scale production infrastructure.
- You have a software-first mindset with strong development and networking experience, able to think like an engineer and operate like an architect.
- You bring deep expertise in TCP/IP, the OSI model, and large-scale IP networking protocols including BGP and OSPF.
- You have hands-on experience with Kubernetes networking technologies such as Cilium and Calico, and a solid understanding of container network interfaces (CNIs).
- You have managed software load balancers like NGINX Ingress, Envoy, or HAProxy in large-scale production environments.
- You are skilled at troubleshooting and performance tuning in Kubernetes and Docker environments, with a focus on networking. Experience running Kubernetes clusters on bare-metal is a plus.
- You are proficient in advanced networking technologies, including:
- IPv6 configuration and transition strategies
- Software-Defined Networking (SDN) and SDN controller experience
- Quality of Service (QoS) implementations and bandwidth management
- You have operated network devices at scale using network operating systems such as SONiC, Cisco IOS, JunOS, Arista EOS, or Nokia SR Linux/SR OS.
- You are comfortable with monitoring and alerting systems, writing complex rules and time-series queries using tools like Prometheus and Grafana.
- You practice infrastructure-as-code and apply DevOps and SRE principles to build and manage networks programmatically.
- You know how to build robust workflows and pipelines to test and safely deploy changes to production.
- You have an interest or background in platform engineering and can plan and build infrastructure to support large-scale, distributed systems.
- Proficient creating automation and building tools using Python or Go.
- Experience integrating AI tools (LLMs, MCP, agentic workflows) into engineering processes to automate tasks and improve development velocity.
Key attributes:
Technical Contributions:
- Demonstrated experience building resilient, always-on networks across diverse technologies and layers.
- Data-driven decision maker who evaluates ROI, implementation complexity, and customer impact before committing to a direction.
- Operationally minded: you reduce complexity, mitigate risk, and keep scaling cost-effective as systems grow.
- A track record of self-directed, high-impact contributions to large-scale, long-horizon projects.
Collaboration and Communication
- Strong communication and documentation skills, with the ability to distil complex technical topics and drive alignment across teams.
- Empathetic thinker who understands the broader context and motivations behind objectives, not just the immediate ask.
- Highly collaborative by nature, able to work fluidly across engineering disciplines and bring people together around a shared goal.
One of the best things about working at The Trade Desk is the breadth of technical opportunity, and we do not expect you to walk in knowing every technology we use. What we care about is your ability to learn quickly, think critically, and reach for the right tool for the job. What you know matters less than how fast you grow and how creatively you solve problems. We are not looking for engineers who have all the answers; we are looking for engineers who can invent answers no one has thought of yet, to questions no one has thought to ask.
As an Equal Opportunity Employer, The Trade Desk is committed to creating an inclusive hiring experience where everyone has the opportunity to thrive.
Please reach out to us at accommodations@thetradedesk.com to request an accommodation or discuss any accessibility needs you may require to access our Company Website or navigate any part of the hiring process.
When you contact us, please include your preferred contact details and specify the nature of your accommodation request or questions. Any information you share will be handled confidentially and will not impact our hiring decisions.
Similar Jobs
Axon
Sr. Site Reliability Engineer I
Axon
Site Reliability Engineer II
New Era Technology
Site Reliability Engineer (SRE)
DevRev
Site Reliability Engineer
DevRev
Site Reliability Engineer / Platform Engineer
Okta