Back to Search
Overview
Senior

Senior / Staff Software Engineer (Observability / SRE)

Confirmed live in the last 24 hours

Waabi

Waabi

Toronto, ON
Remote
Posted February 12, 2026

Job Description

Waabi, founded by AI visionary Raquel Urtasun, is the leader in Physical AI. With a world-class team, we're unlocking the next era of autonomous transportation with technology that's powering commercial autonomous trucks and robotaxis. Waabi is backed by and partners with world leaders in AI, automotive, logistics, and deep tech.

With offices in Toronto, San Francisco, Dallas, and Pittsburgh, Waabi is growing quickly and looking for diverse, innovative and collaborative candidates who want to impact the world in a positive way. To learn more visit: www.waabi.ai

You will..
- Design and lead the architecture and development of Waabi’s monitoring and observability stack, used to monitor the health and performance of cloud and on-prem environments.
- Develop and extend workloads and benchmarks (compute, storage, network, ML/AI) and integrate stress, chaos, and regression tests to validate hardware and platform choices.
- Analyze and optimize end-to-end performance across hardware, firmware, Linux kernel, runtimes, and distributed services using advanced profiling tools (perf, eBPF, flamegraphs, tracing frameworks).
- Build automation and observability tooling (Go/Python/Java, Kubernetes/Docker) for CI/CD-based performance regression detection, telemetry, alerting, and anomaly detection.
- Work with client teams to support their applications’ observability requirements.
- Influence system architecture and tooling decisions that improve how Waabi builds, monitors, and scales its infrastructure.
- Drive execution and quality, writing design docs, setting milestones, mentoring ICs, and communicating insights and results to stakeholders and leadership.
 
Qualifications:
- 5+ years software engineering or systems/performance engineering experience (BS in CS/EE or related), with demonstrated end-to-end ownership of complex projects.
- Proficient in at least one of: Python, Rust, C/C++; strong CS fundamentals and system design skills.
- Hands-on with Linux internals (CPU scheduling, memory, I/O, networking) and perf tooling (perf, eBPF, flamegraphs, tracing frameworks).
- Experience with Kubernetes, microservices, and distributed systems; comfort building production services and pipelines.
- Proven track record of clear communication, writing design docs, and leading cross-functional efforts.
 
Bonus:
- Experience deploying and managing observability platforms (OpenTelemetry, Grafana OSS).
- Performance tuning for databases/streaming/batch/ML platforms; GPU/xPU or Arm performance exposure.
- Experience tuning stream processing, batch or ML platforms (e.g. Argo Workflows, PyTorch).
- Familiarity with microservices debugging and distributed tracing (OpenTelemetry, Prometheus).
pythonjavagorustkubernetesdockeraidataproductdesign