Senior Software Engineer, Observability
Confirmed live in the last 24 hours
CoreWeave
Compensation
$139,000 - $220,000/year
Job Description
What You’ll Do:
Join CoreWeave’s Observability team, responsible for building the systems that give our customers and internal teams unparalleled visibility into complex AI workloads. Our team empowers engineers to understand, troubleshoot, and optimize high-performance infrastructure at massive scale.
About the role:
As a Senior Software Engineer on the Observability team, you will design, build, and maintain core observability infrastructure spanning metrics, logging, tracing, and telemetry pipelines. Your day-to-day will involve developing highly reliable and scalable systems, collaborating with internal engineering teams to embed observability best practices, and tackling performance and reliability challenges across clusters of thousands of GPUs. You’ll also contribute to platform strategy and participate in on-call rotations to ensure critical production systems remain robust and operational.
Who You Are:
- 5+ years of experience in software or infrastructure engineering with a focus on designing, building, and operating large-scale distributed systems in production.
- Proficient in Go or Python with experience writing clean, testable, and resilient production code.
- Hands-on experience with Kubernetes, containerization, and microservices architectures in production environments.
- Proven ability to design and deliver scalable, robust systems with high-quality code, automated testing, and progressive release strategies.
- Skilled in decomposing complex problems in distributed architectures into manageable, well-scoped work.
- Familiar with Helm and YAML-based configurations for deploying and managing services, including templating, automation, and infrastructure-as-code practices.
- Experience participating in on-call rotations for critical production systems.
- Bachelor’s degree in Computer Science, Electrical Engineering, Mathematics, or related field.
Preferred:
- Experience designing, operating, or scaling logging, metrics, or tracing platforms (e.g., Loki, ClickHouse, Elasticsearch, Prometheus, VictoriaMetrics, Grafana, Thanos).
- pythongorustawskubernetesaidataproductdesign
Similar Jobs
Roku
Senior Software Engineer - Cloud Infrastructure & Observability
Roku
Senior Software Engineer - Cloud Infrastructure & Observability
CoreWeave
Software Engineer, Observability
CoreWeave
Senior Software Engineer, Observability Insights
Gympass
Staff Platform Engineer | Observability
Anthropic