Back to Search
Overview
Mid-Level

Systems Engineer, Kernel (Networking)

Confirmed live in the last 24 hours

CoreWeave

CoreWeave

Compensation

$153,000 - $242,000/year

Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA
Hybrid
Posted March 25, 2026

Job Description

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com.

Senior Systems Engineer, Kernel Networking

CoreWeave is seeking a specialized Kernel Networking Engineer to join our HAVOCK Team. In this role, you will be the subject matter expert for the networking subsystem of CoreWeave’s Linux-based infrastructure. As we scale our massive AI/HPC clusters, you will focus on optimizing the datapath, tuning the TCP/IP and RDMA stacks, and ensuring the stability of high-throughput workloads across NVIDIA, Mellanox, and Broadcom hardware.

Hardware - Acceleration - Virtualization - Operating Systems - Containerization - Kubelet

Our Team’s Stack:

  • Python, Go, bash/sh, C
  • Custom Linux Kernel, Ubuntu
  • Debug Tools: crash, kdump, drgn, gdb
  • Prometheus, Victoria Metrics, Grafana, Loki
  • Docker, kubernetes (k8s), KubeVirt

Focus Areas:

  • Holistic Troubleshooting – Act as the first line of defense for complex system crashes, soft lockups, and kernel panics.
  • Cross-Domain Debugging – Identify whether a root cause lies in memory management, storage, or the network layer.
  • Incident Response – Reduce "Mean-Time-To-Resolution" by quickly analyzing crash dumps and stack traces.
  • Reliability Engineering – Contribute to the "Smarter Triaging" initiative to automate crash analysis.
  • Fleet Stability – Ensure kernel support across diverse hardware (CPUs, GPUs, DPUs).

Responsibilities:

  • Analyze kernel crashes, oopses, and panics across the entire stack.
  • Apply specific networking knowledge to troubleshoot issues with NVIDIA/Mellanox/Broadcom NICs.
  • Utilize crash dump analysis (kdump, crash, drgn) to triage issues affecting customer workloads.
  • Improve documentation and RCA processes for kernel failures.
  • Assist in maintaining kernel builds and CI/CD pipelines to streamline testing.

Requirements:

  • 5+ years of experience in systems-level development or kernel engineering.
  • Broad Kernel Knowledge: Solid grasp of memory management, scheduling, and filesystems.
  • Networking Fluency: Proven record troubleshooting RoCE, IB, and RDMA issues.
  • Debugging Mastery: Expert capability with standard utilities and a systematic approach to root-cause analysis.
  • Excellent verbal and written communication skills (ability to explain complex kernel bugs to stakeholders).

Nice-to-haves:

  • Experience with eBPF for troubleshooting.
  • Knowledge of GPU/NVLink architectures.
  • Experience working with automated monitoring/alerting systems (Grafana, Jira automation).
  • Willingness to present at conferences (LPC, LSFMMBPF).

Our compensation reflec

pythongorustawskubernetesdockeraidata