Back to Search
Overview
Senior

Senior Cloud Platform Engineer

Confirmed live in the last 24 hours

SambaNova Systems

SambaNova Systems

Palo Alto, California, United States
Hybrid
Posted November 22, 2025

Job Description

The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.

SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.

About SambaNova Systems

Join the company that's building the future of AI computing. At SambaNova, we are disrupting the AI and high-performance computing space with our integrated hardware and software platform. Our DataScale systems and SambaFlow software are pushing the boundaries of what's possible with generative AI and large language models. We are a team of passionate innovators tackling some of the world's most challenging computational problems.

 

The Role

As a Senior Cloud Site Reliability Engineer (SRE) specializing in our AI Inferencing Service, you will be the guardian of its reliability, performance, and scalability. You will bridge the gap between software development and operations, applying an engineering mindset to solve operational challenges. Your primary focus will be ensuring our inference endpoints have exceptional uptime, low-latency response times, and efficient resource utilization, directly impacting the experience of our customers and the success of our AI products. This role includes participating in a shared on-call rotation to maintain 24/7 service reliability. 

 

What You'll Do

Service Ownership & On-Call: Take shared ownership of the production inferencing service, including its availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning across multiple regions. This includes implementing and supporting AI infrastructure in new regions, such as Asia, Europe, and Latin America, to support the growth of our business.  Participate in a balanced on-call rotation to provide 24/7 support for the service.

 

On-Call & Work-Life Balance

We believe a sustainable on-call schedule is critical for long-term success and team health. Our on-call philosophy is built on the following principles:

  • Balanced Rotation: The on-call rotation is shared equally across the team, typically following a primary/secondary (follow-the-sun) model to ensure no single person bears a disproportionate burden.
  • Focus on Prevention: We invest heavily in automation, robust testing, and system design to prevent pages before they happen. The goal of on-call is not to heroically fight fires, but to manage rare, complex failures and use those learnings to make the system more resilient.
  • Actionable Alerts: We have a strict policy against alert fatigue. Alerts must be actionable and require immediate human intervention.
  • Incident Management: Lead the response to incidents affecting the inferencing service, driving blameless post-mortems and implementing corrective actions to prevent recurrence.
  • Monitoring & Alerting: Develop and maintain advanced monitoring, alerting, and dashboarding (using tools like Prometheus, Grafana, Datadog) to gain deep insights into service health, model performance (e.g., latency, throughput, error rates), and accelerator utilization. A key responsibility is ensuring alerts are actionable and have a low false-positive rate, minimizing on-call fatigue.
  • Performance & Scalability: Proactively identify and eliminate performance bottlenecks. Design and implement auto-scaling policies to handle variable inference loads cost-effectively. Use insights from on-call incidents to drive improveme
pythonjavagorustawsgcpazurekubernetesdockerai