Senior Software Engineer, Observability Insights
Confirmed live in the last 24 hours
CoreWeave
Compensation
$165,000 - $242,000/year
Job Description
What You’ll Do:
Join CoreWeave’s Observability team, where we are building the next-generation insights layer for AI systems. Our team empowers internal and external users to understand, troubleshoot, and optimize complex AI workloads by transforming telemetry into actionable insights.
About the role:
As a Senior Software Engineer on the Observability Insights team, you will lead the development of agentic interfaces and product experiences that sit atop CoreWeave’s telemetry layer. You’ll design multi-tenant APIs, managed Grafana experiences, and MCP-based tool servers to help customers and internal teams interact with data in innovative ways. Collaborating closely with PMs and engineering leadership, your work will shape the end-to-end observability experience and influence how people engage with cutting-edge AI infrastructure.
Who You Are:
- 6+ years of experience in software or infrastructure engineering building production-grade backend systems and distributed APIs.
- Strong focus on developer-facing infrastructure, with a customer-obsessed approach to SDKs, CLIs, and APIs.
- Proficient in reliability engineering, including fault-tolerant design, SLOs, error budgets, and multi-tenant system resilience.
- Familiar with observability systems such as ClickHouse, Loki, VictoriaMetrics, Prometheus, and Grafana.
- Experienced in agentic applications or LLM-based features, including grounding, tool calling, and operational safety.
- Comfortable writing production code primarily in Go, with the ability to integrate Python components when needed.
- Collaborative experience in agile teams delivering end-to-end telemetry-to-insights pipelines.
Preferred:
- Experience operating Kubernetes clusters at scale, especially for AI workloads.
- Hands-on experience with logging, tracing, and metrics platforms in production, with deep knowledge of cardinality, indexing, and query optimization.
- Experienced in running distributed systems or API services at cloud scale, including event streaming and
Similar Jobs
Roku
Senior Software Engineer - Cloud Infrastructure & Observability
Roku
Senior Software Engineer - Cloud Infrastructure & Observability
CoreWeave
Software Engineer, Observability
CoreWeave
Senior Software Engineer, Observability
Gympass
Staff Platform Engineer | Observability
Anthropic