Back to Search
Overview
Senior

Senior Software Engineer, Observability

Confirmed live in the last 24 hours

CoreWeave

CoreWeave

Compensation

$139,000 - $220,000/year

New York, NY / Sunnyvale, CA
Hybrid
Posted April 17, 2026

Job Description

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com.

What You’ll Do:

Join CoreWeave’s Observability team, responsible for building the systems that give our customers and internal teams unparalleled visibility into complex AI workloads. Our team empowers engineers to understand, troubleshoot, and optimize high-performance infrastructure at massive scale.

About the role:
As a Senior Software Engineer on the Observability team, you will design, build, and maintain core observability infrastructure spanning metrics, logging, tracing, and telemetry pipelines. Your day-to-day will involve developing highly reliable and scalable systems, collaborating with internal engineering teams to embed observability best practices, and tackling performance and reliability challenges across clusters of thousands of GPUs. You’ll also contribute to platform strategy and participate in on-call rotations to ensure critical production systems remain robust and operational.

Who You Are:

  • 5+ years of experience in software or infrastructure engineering with a focus on designing, building, and operating large-scale distributed systems in production.
  • Proficient in Go or Python with experience writing clean, testable, and resilient production code.
  • Hands-on experience with Kubernetes, containerization, and microservices architectures in production environments.
  • Proven ability to design and deliver scalable, robust systems with high-quality code, automated testing, and progressive release strategies.
  • Skilled in decomposing complex problems in distributed architectures into manageable, well-scoped work.
  • Familiar with Helm and YAML-based configurations for deploying and managing services, including templating, automation, and infrastructure-as-code practices.
  • Experience participating in on-call rotations for critical production systems.
  • Bachelor’s degree in Computer Science, Electrical Engineering, Mathematics, or related field.

Preferred:

  • Experience designing, operating, or scaling logging, metrics, or tracing platforms (e.g., Loki, ClickHouse, Elasticsearch, Prometheus, VictoriaMetrics, Grafana, Thanos).
  • pythongorustawskubernetesaidataproductdesign