Software Engineer, Observability
Confirmed live in the last 24 hours
CoreWeave
Compensation
$109,000 - $145,000/year
Job Description
What You’ll Do:
The Observability team builds and operates CoreWeave’s logging, tracing, and metrics platforms, along with the high-throughput data pipelines that power them. This team enables both internal engineers and customers to monitor, troubleshoot, and optimize AI workloads running on GPU-dense infrastructure at massive scale.
About the role:
As a Software Engineer on the Observability team, you will design, build, and maintain scalable systems that process and surface telemetry data across distributed environments. You’ll contribute production-quality code in languages like Go and Python, while improving system reliability through enhanced monitoring, alerting, and incident response practices. Day to day, you’ll collaborate with cross-functional engineering teams to implement observability best practices, support production systems, and help optimize performance across large-scale infrastructure. You will also participate in on-call rotations and contribute to continuous improvements based on real-world system behavior.
Who You Are:
- 2+ years of experience in Software Engineering, Site Reliability Engineering, DevOps, or a related field
- Proficiency in at least one programming or scripting language (e.g., Python, Go)
- Experience working with Kubernetes, containerization, and microservices architectures
- Experience participating in on-call rotations, including triaging and escalating production issues
- Hands-on experience using observability systems (metrics, logging, tracing) to debug distributed systems
Preferred:
- Experience operating observability platforms or databases (e.g., ClickHouse, Elastic, Loki, VictoriaMetrics, Prometheus, Thanos, OpenTelemetry, Grafana)
- Familiarity with infrastructure-as-code tools such as Terraform
- Experience with modern testing frameworks and deployment strategies (e.g., canary, blue-green)
- Experience with data streaming technologies (e.g., Kafka, Kafka Connect)
Similar Jobs
Roku
Senior Software Engineer - Cloud Infrastructure & Observability
Roku
Senior Software Engineer - Cloud Infrastructure & Observability
CoreWeave
Senior Software Engineer, Observability Insights
CoreWeave
Senior Software Engineer, Observability
Gympass
Staff Platform Engineer | Observability
Anthropic