Back to Search
Overview
Senior

Senior Platform Engineer, Voice AI

Confirmed live in the last 24 hours

Together AI

Together AI

San Francisco
On-site
Posted March 30, 2026

Job Description

About the Role

Together AI is building the best inference infrastructure for voice applications. Our Voice AI platform powers production-grade, real-time voice agents and applications — serving speech-to-text and text-to-speech models with best-in-class latency and reliability.

We're looking for a Senior Platform Engineer to own the API and infrastructure layer for voice workloads. You'll build the real-time WebSocket and HTTP APIs that developers use to ship voice experiences, design autoscaling for latency-sensitive streaming workloads, and ensure our multi-provider voice platform is reliable enough for production voice agents handling millions of calls.

This is a foundational hire on a small, high-impact team. Voice APIs have fundamentally different infrastructure requirements than text-based inference — bidirectional audio streaming, stateful connections, tight latency SLOs, and complex multi-model routing. You'll define how developers interact with Together's voice platform as we grow from early customers to the default infrastructure for voice AI.

  • Own the real-time API layer (WebSocket + HTTP streaming) that powers Together's voice platform.
  • Design autoscaling and orchestration for voice workloads running on tens of thousands of GPUs.
  • Build the developer experience — APIs, observability, and tooling — for a fast-growing product area.
  • Work with production voice customers (contact centers, AI agents, communication platforms) to ship what they actually need.
  • Join a small, early-stage team with outsized impact on a new product line.

Responsibilities

  • Build and harden real-time WebSocket and HTTP streaming APIs for STT and TTS — including connection lifecycle management, backpressure, error handling, and reconnection, at the reliability bar needed for production voice agents.
  • Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns — accounting for concurrent connection limits, streaming state, and hard latency ceilings.
  • Implement voice-specific API features: word-level alignment, speaker diarization in realtime, audio format flexibility (g711/mulaw for telephony, PCM, WebRTC formats), pronunciation controls, and multi-context WebSocket support.
  • Build voice-specific observability — latency breakdowns, audio quality signals, and dashboards that help both the team and customers debug issues.
  • Own multi-model normalization across our model partners (Cartesia, Deepgram, Rime, and others), ensuring consistent API behavior regardless of the underlying provider.
  • Collaborate with the ML engineering side of the team on the interface between the API layer and the model serving stack, ensuring latency and reliability requirements are met end-to-end.
  • Contribute to developer experience — API design, documentation, integration cookbooks, playground and showcasing how best-in-class voice agents are built.
  • Lay the groundwork for multiple new products down the line.

Requirements

  • 5+ years of experience building large-scale, real-time distributed systems and API services.
  • Deep expertise in real-time streaming infrastructure — WebSocket server architecture, Server-Sent Events, bidirectional streaming, connection multiplexing, and stateful protocol design.
  • Expert-level programming in TypeScript and Python; experience with Rust is a plus.
  • Strong distributed systems fundamentals: load balancing, autoscaling, rate limiting, and traffic shaping for latency-sensitive workloads.
  • Experience with Kubernetes — including custom autoscalers, resource management, and health checking for stateful services.
  • Strong product sense — you care about API ergonomics and think about what developers building voice apps actually need.
  • Comfort working on a small, early-stage team where you'll wear multiple hats and move fast.
  • Experience with audio or media protocols (WebRTC, g711, PCM encoding) is a strong plus.
  • Familiarity with ML model serving infrastructure and how inference engines work is a plus — you'll interface with the serving layer regularly.
  • Full-stack experience (React, Next.js) is a nice-to-have for contributing to developer-facing tooling.
  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or rel
reactpythontypescriptgorustkubernetesaidataproductdesign