Overview

Staff

Member of Technical Staff, Inference (Bay Area, Remote)

Confirmed live in the last 24 hours

Genesis AI

Bay Area

Remote

Posted September 9, 2025

Job Description

What You’ll Do

Build low-latency inference pipelines for on-device deployment, enabling real-time next-token and diffusion-based control loops in robotics
Design and optimize distributed inference systems on GPU clusters, pushing throughput with large-batch serving and efficient resource utilization
Implement efficient low-level code (CUDA, Triton, custom kernels) and integrate it seamlessly into high-level frameworks
Optimize workloads for both throughput (batching, scheduling, quantization) and latency (caching, memory management, graph compilation)
Develop monitoring and debugging tools to guarantee reliability, determinism, and rapid diagnosis of regressions across both stacks

What You’ll Bring

Deep experience in distributed systems, ML infrastructure, or high-performance serving (8+ years)
Production-grade expertise in Python, with strong background in systems languages (C++/Rust/Go)
Low-level performance mastery: CUDA, Triton, kernel optimization, quantization, memory and compute scheduling
Proven track record scaling inference workloads in both throughput-oriented cluster environments and latency-critical on-device deployments
System-level mindset with a history of tuning hardware–software interactions for maximum efficiency, throughput, and responsiveness

pythongorustproductdesign

Similar Jobs

Lore Group

Bartender Super Lyan

Mid-LevelKimpton De Witt Amst...

University of Louisville

Athletic Equipment Coordinator II

Mid-LevelBelknap Campus

Alliaxis

Health, Safety and Environment Business Partner

Mid-LevelNZL Ashburton - 19 M...

Dentsu

Group Strategy Director

Lead / ManagerSydney

Tlingit Haida

Seasonal Grounds Maintenance Laborer

Mid-LevelColorado Springs, Co...

Tlingit Haida

Seasonal Grounds Maintenance Laborer - TEMP

Mid-LevelColorado Springs, Co...