Back to Search
Overview
Mid-Level

Compute Server Platform Architect

Confirmed live in the last 24 hours

Cerebras Systems

Cerebras Systems

Sunnyvale CA or Toronto Canada
On-site
Posted February 18, 2026

Job Description

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.  

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. 

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role

As a Compute / Server Platform Architect on the Cluster Architecture Team, you will own the server-side platform architecture that enables Cerebras CS3-based AI clusters (training and inference) to deliver predictable performance, scalability, and reliability. Our accelerators are network-attached, so the x86 server fleet is a first-class part of the end-to-end system: it runs critical-path runtime functions (for example orchestration, prompt caching, and IO/control services) and must be co-designed with software for token-level latency, throughput, and cost efficiency. You will translate workload behavior into CPU, memory, IO, PCIe, and host-networking requirements, drive platform evaluations with vendors, and provide technical leadership through qualification and production adoption in close partnership with other function leaders and TPMs.

 

Responsibilities

  • Own the architecture for all server roles in Cerebras clusters, including definitions of server types, configurations, and lifecycle strategy.
  • Define and maintain server formulas (counts and ratios per CS-3 count, cluster size, and workload type) including capacity planning and headroom policy.
  • Specify platform configurations: CPU SKU and core strategy, our vendor roadmap (e.g., AMD, Intel, ARM), memory topology (channels, DIMM type, capacity), PCIe topology and lane budgeting, NIC selection/placement, and local NVMe policy where applicable.
  • Translate software and runtime flows into measurable hardware requirements (CPU utilization, memory bandwidth/latency, bursty IO patterns, queueing and concurrency limits) and communicate clear guardrails back to software teams.
  • Develop performance and scaling models; validate with microbenchmarks and workload-level experiments; identify bottlenecks and drive cross-stack fixes.
  • Define the OS, BIOS, firmware, and driver baseline for each server type; there are other teams that follow these recommendations and apply them on our fleet.
  • Stay current on emerging server technologies (CPU generations, new memory technologies, CXL, NVMe evolutions, SmartNIC/DPU capabilities where relevant) and run proof-of-concept evaluations to determine when to adopt.
  • Lead technical vendor engagements (OEM/ODM and component vendors): influence roadmap, request platform knobs, and drive joint debugging on performance or reliability issues.
  • Define qualification and acceptance criteria (performance, stability, operability) and partner with the Infrastructure Hardware TPM to execute qualification plans and land changes cleanly into production.
  • Support bring-up and rare deployment debugging in lab and staging environments; drive root-cause analysis for regressions spanning firmware, drivers, OS, and runtime behavior.

Skills and Qualifications

  • PhD. in Computer Science or Electrical/Computer
pythongomachine learningaiiosdataproductdesign