Senior Software Development Engineer in Test (SDET) - AI Cluster
Confirmed live in the last 24 hours
Cerebras Systems
Job Description
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.
Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.
Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.
In AI infrastructure organization, simplifying large hardware deployments with push button, single pane of glass for observability/monitoring and software capabilities for build-in resiliency are some of the key focus areas. As senior software development engineer in Test, we are looking for a candidate who can make a big impact on how we test and validate thousands of nodes in large deployments to ensure the cluster is 99.999% reliable.
Responsibilities
- You will be hired to innovate and execute tests on cutting edge AI infrastructure. Be a thinker, define optimized test strategies and methodologies.
- Cerebras is growing and innovating at a rapid pace and so is the ML community and AI models. Be a quick learner, adapt to new technologies, and bring your expertise. We are looking to hire a team with a diverse skill set.
- Deep understanding of how large-scale distributed ML training and inference works. Build a strong understanding of how to break these large distributed systems challenge into smaller components that can be unit tested.
- nodepythongoawskubernetesdockermachine learningaiiosdata
Similar Jobs
Roku
Senior Software Engineer, Data - Advertising Engineering
Webflow
Senior Staff Engineer, Developer Productivity
Scale AI
Senior Full-Stack Software Engineer, (Forward Deployed), GPS
Scale AI
Staff FullStack Software Engineer, (Forward Deployed), GPS
Coinbase
Senior Software Engineer - Infrastructure Enablement
Airtable