Senior Full Stack LLM Engineer - Training
Confirmed live in the last 24 hours
Cerebras Systems
Job Description
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.
Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.
Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.
About the Role
We are seeking a versatile and experienced engineer to join our SOTA Training Platform team. This team is responsible to rapidly bring up state-of-the-art open-source models (like LLaMA, Qwen, etc) or customer-provided proprietary models on our Cerebras CSX systems. Success in this role requires a system-minded generalist who thrives in fast-paced bringup environments and is comfortable working across the entire Cerebras software stack.
Your work will play a critical role in achieving unprecedented levels of performance, efficiency, and scalability for AI applications.
- Contribute to the end-to-end bring up of ML models on Cerebras CSX systems.
- Work across the stack: model architecture translation, graph lowering, compiler optimizations, runtime integration, and performance tuning.
- Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization.
- Propose and prototype improvements across tools, APIs, or automation flows to accelerate future bring ups.
- Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field.
- 5+ years of relevant industry experience (internship/co-op experience included)
- Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
- Strong debugging skills across performance, numerical accuracy, and runtime integration.
- Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and familiarity with model internals (e.g., attention, MoE, diffusion).
Similar Jobs
AIG
GenAI Software Engineer (Full Stack)
Wells Fargo
Software Engineer - Full Stack Java, Spring, ReactJS, UI, Gen AI
Expedia
Full Stack Software Engineer II
NVIDIA
Senior Full Stack and AI Applications Engineer, GeForce Now
Microsoft
Member of Technical Staff - Full Stack Engineer, ML Efficiency & Observability
CVS Health