About the role
About us
Graphcore is one of the world’s leading innovators in Artificial Intelligence compute.
It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.
As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world’s most transformative technologies. Together, they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.
Graphcore’s teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation.
Job Summary
We are seeking a Senior Principal Network Engineer to help design, deploy, and optimize next‑generation AI data center networks. AI training and inference workloads require extremely high bandwidth, deterministic low latency, and zero‑packet‑loss networking environments.
In this role, you will partner closely with the Network Architecture Lead to design and scale high‑performance computing (HPC) network fabrics supporting GPU clusters. You will work across hardware, networking, and AI application layers to ensure Graphcore’s large‑scale AI infrastructure operates at peak performance.
The ideal candidate brings deep experience operating hyperscale or HPC data center networks and has expertise in high‑speed Ethernet fabrics, RDMA technologies, advanced automation, and telemetry systems.
The Team
The Data Center Network Engineering team designs and operates the high‑performance network fabrics that power Graphcore’s AI compute platforms. The team collaborates closely with hardware engineering, AI researchers, and infrastructure teams to build scalable networking environments optimized for distributed training and inference workloads.
Engineers work on pioneering technologies including high‑speed Ethernet fabrics, lossless networking, RDMA transport, and large‑scale automation frameworks to support next‑generation AI clusters.
Responsibilities and Duties
- Assist in defining ultra‑high‑bandwidth, non‑blocking AI network fabrics (Clos spine‑leaf‑super‑spine architectures) for large‑scale distributed AI workloads.
• Optimize performance of lossless Ethernet fabrics using congestion control mechanisms such as PFC, ECN, and DCQCN to support RDMA/RoCEv2 communication.
• Lead initiatives to implement NetDevOps practices and develop automation for provisioning, configuration management, and network remediation.
• Design and deploy high‑resolution telemetry pipelines to monitor network health, detect microbursts, and analyze congestion patterns.
• Support modeling, deployment, configuration, and monitoring of data center network fabrics including scale‑out, scale‑up, and front‑end networks.
• Collaborate cross‑functionally with hardware engineers, AI researchers, and data center operations teams to co‑design high‑performance infrastructure.
• Provide technical leadership and mentorship to network engineers while establishing best practices and operational standards.
• Contribute to the long‑term networking strategy and roadmap for Graphcore’s AI infrastructure.
• Research and evaluate next‑generation high‑speed networking technologies and vendor solutions.
Candidate Profile
Essential
- BS or MS or equivalent experience in Computer Science, Electrical Engineering, Network Engineering, or related technical discipline.
• 12+ years of progressive network engineering experience with at least 3 years in hyperscale, high‑density, or HPC data center environments.
• Expert‑level knowledge of data center routing and switching protocols including BGP, OSPF, and EVPN‑VXLAN architectures.
• Strong operational understanding of RDMA networking technologies such as RoCEv2 or InfiniBand.
• Hands‑on experience with modern merchant silicon networking platforms and NOS platforms such as Arista EOS, Cisco NX‑OS, or SONiC.
• Experience deploying high‑speed network technologies including 400G/800G optics and large‑scale fabric architectures.
• Proficiency in automation and scripting languages such as Python, Go, Bash, or similar tools.
• Strong collaboration and communication skills across cross‑functional engineering teams.
Desirable
- Experience operating large‑scale AI or GPU clusters.
• Familiarity with network telemetry frameworks and streaming analytics.
• Experience implementing NetDevOps workflows and infrastructure automation pipelines.
• Experience influencing vendor roadmaps or evaluating next‑generation networking technologies.
In addition to a competitive salary, Graphcore offers flexible working and a comprehensive benefits package designed to support your health, wellbeing and financial future. Our benefits include medical, dental and vision coverage, Flexible Spending Accounts (FSAs), Health Savings Accounts (HSAs), disability and life insurance, a 401(k) retirement plan, commuter benefits, wellness services and an Employee Assistance Programme (EAP). We welcome people of different backgrounds and experiences; we're committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.
Aplyr's read
Graphcore is a cutting-edge semiconductor company focusing on AI and machine learning hardware, appealing to engineers interested in high-performance computing and innovative chip design.
What's promising
- •Graphcore's IPU technology is at the forefront of AI hardware innovation.
- •The company has secured significant funding, indicating strong investor confidence.
- •Graphcore's partnerships with major tech firms enhance its industry credibility.
What to watch
- •Graphcore faces intense competition from established players like NVIDIA.
- •The company's financials are not publicly available, limiting transparency.
- •Market adoption of IPU technology remains uncertain in the face of GPU dominance.
Why Graphcore
- •Graphcore's IPU is specifically designed for AI workloads, unlike traditional CPUs and GPUs.
- •The company emphasizes a software ecosystem tailored to its hardware.
- •Graphcore is known for its collaborative culture and focus on cutting-edge research.
Aplyr’s read is generated by AI from public sources. Was it useful?
Similar roles
Senior Network Security Engineer
Anduril Industries
Senior Network Security Engineer
Anduril Industries
Embedded Software Engineer - Network Software
Astranis
Senior Software Engineer (Network Simulation)
Anduril Industries
Staff Security Engineer, Network Security
CoreWeave
Software Engineer, Network Services
CoreWeave