Back to Search
Overview
Mid-Level

Operations Engineer, HPC Networking

Confirmed live in the last 24 hours

CoreWeave

CoreWeave

Compensation

$110,000 - $179,000/year

Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA
Hybrid
Posted April 17, 2026

Job Description

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com.

What You’ll Do:

In this role, you will support the deployment, monitoring, troubleshooting, and maintenance of large-scale InfiniBand fabrics, ensuring their stability and performance. The ideal candidate will have a strong operations mindset, effective collaboration skills, and the ability to solve complex issues in a dynamic environment.

  • Regularly monitor the performance and health of InfiniBand fabrics, including switches, host adapters, and nodes.
  • Investigate and resolve operational issues within InfiniBand fabrics, such as network connectivity problems and performance bottlenecks.
  • Assist with the installation and operational bring-up of large InfiniBand fabrics in collaboration with onsite personnel and customer teams.
  • Perform routine maintenance and upgrades on InfiniBand switches and control plane components.
  • Collaborate with HPC cluster operations teams to provide troubleshooting and operational expertise.

Investing in our people is one of our top priorities, and we value candidates who can bring their diversified experiences to our teams. Here are some qualities we’ve found compatible with our team. We'd love to talk about whether this aligns with your experience and Interests and what you’re excited to work on next.

Who You Are:

Minimum Qualifications

  • At least 1 year of experience with InfiniBand or similar networking technologies.
  • Solid understanding of networking concepts, including architectures, topologies, operational best practices, and troubleshooting.
  • Experience with Linux system administration and maintenance.
  • Proficiency in at least one scripting language

Preferred Qualifications

  • Hands-on experience with Nvidia UFM or similar fabric management tools.
  • Familiarity with SLURM job scheduler and its role in HPC environments.
  • Experience with monitoring and visualization platforms such as Grafana or Prometheus.
  • Experience with operational tooling and automation frameworks like Ansible.
  • Knowledge of data center operations, including server racks, and cabling.
  • Python or Bash scripting.

Why CoreWeave?

At CoreWeave, we work hard, have fun, and move fast!  We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values: 

  • Be Curious at Your Core
  • Act Like an Owner
  • Empower Employees
  • Deliver Best-in-Class Client Experiences
  • Achieve More Together

We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization's growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us! 

nodepythongorustawsaidataproduct