Senior Hardware Engineer, GPU & PCIe
Confirmed live in the last 24 hours
CoreWeave
Compensation
$150,000 - $242,000/year
Job Description
What You'll Do:
CoreWeave is seeking a highly skilled and motivated Infrastructure/Hardware Engineer, focusing on GPU and PCIe troubleshooting, to join our Hardware Engineering team, reporting to the Hardware Engineering Manager. In this role, you will play a crucial part in the design, development, troubleshooting, and optimization of our server hardware infrastructure. You will collaborate closely with cross-functional teams, external vendors, and stakeholders to ensure the successful delivery of highly performant and reliable hardware solutions.
About the Role:
- Troubleshoot complex GPU and PCIe related failures
- Partner with external vendors on failure analysis
- Track component RMAs
- Develop and maintain hardware/firmware management services.
- Automate all aspects of the server hardware lifecycle.
- Serve as the senior point of contact for hardware escalation and troubleshooting.
- Collaborate with cross-functional teams to define hardware requirements, specifications, system architecture and issue identification and resolution playbooks.
- Create and maintain accurate documentation of hardware designs, specifications, test procedures, and results.
- Analyze and optimize the performance of hardware systems, identify bottlenecks, and propose improvements for enhanced efficiency.
- Establish processes for internal hardware testing, deployment, performance optimization and troubleshooting.
Who You Are:
- 5+ years of prior experience supporting and troubleshooting data center class GPUs ( H100 or newer, including Infiniband and NVLink).
- Proficiency in ansible/python and experience with programmatically interacting with server BMCs, using IPMI or Redfish (preferably Redfish).
- Experience using, integrating and automating data center class GPU diagnostics and troubleshooting tools, including observability platforms like prometheus and grafana.
- In-depth knowledge of server hardware, components, and management technologies, particularly GPUs and PCIe devices.
- Proven ability to stay updated with the latest industry technologies and trends.
- Previous experience collaborating with hardware vendors to identify novel issues, generate operational playbooks, create alerts and drive issue resolution to completion
- Strong passion for automation, with a commitment to automating processes comprehensively.
- Excellent documentation skills and attention to detail.
- Strong analytical and problem-solving abilities.
Wondering if you’re a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match.
Why CoreWeave?
At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afrai
Similar Jobs
Boeing
Associate Manufacturing Engineer (Electrical/Electronics)
JLL
Position Electrical Engineering Owner’s Representative (Data Centre Project Management)
Medtronic
Sr. Electrical Engineer (Cardiac Ablation Solutions)
Thermo Fisher
Electronics Engineer - SEM
Thermo Fisher
Sr FPGA Engineer
QIAGEN