Distributed Systems Cluster Security Software – Engineering Lead
Confirmed live in the last 24 hours
Cerebras Systems
Compensation
$140,000 - $240,000/year
Job Description
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.
Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.
Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.
About The Role
In this role, you will be the security czar for the Cerebras’s AI cluster product. Such AI clusters have 100’s of Wafer-scale accelerator systems, 1000’s of high-end servers, and several 1000’s of networking ports including switches. Plus, there will be network attached storage, all in a large-scale datacenter.
You will ensure that Cerebras’s large-scale AI clusters are secured through first-principles, best practices, security-first based engineering. Cerebras cluster involves complex HW components, networking and a vertically integrated cluster management software stack – all the way from a bare-metal deployment that brings up an operational cluster to a suite of cluster management software that enables multi-tenant higher-level training and inference services to be hosted on such large clusters.
Your role will be to ensure both end-to-end security as well as privacy of such cluster use-cases. You will develop security engineering solutions that have the necessary network access control, user access controls, and world-class multi-tenancy solution
Responsibilities
- Be the primary engineering face and owner of cluster security.
- Provide strong technical leadership in cluster security for the company.
- Actively work with corporate security, and customers to identify and define security enhancements needed.
- Build engineering driven software that will provide guardrails, detection solution and response tools for vulnerabilities at all layers of vertical stack (includes HW and SW).
- Straddle vertically and horizontally cross functional collaboration to ensure end-to-end cluster software is secure.
- Develop, maintain and execute roadmap of the cluster security product.
- Build an outstanding engineering team to deliver world-class security solution.
Skills & Qualifications
- 3+ years of demonstrated engineering leadership/management role in distributed systems security.
- Proven track record of delivering product, launching and deploying secured distributed solutions to customers.
- Excellent communication, articulation, collaboration and ability to act as a stakeholder.
- Tough decision-making skills with data and trade-off analysis.
- Outstanding sense for product and user journeys, out-of-box thinker.
- Outstanding road map and schedule execution skills under tight timeline and budgets.
- Strong background in multi-tenancy of large scale clusters is necessary.
- Strong technical experience in computer and cluster networks is necessary.
- Strong technical background in distributed systems software development (K8s and its ecosystem) is preferred.
- Technical experience with bare metal cluster management software and r
Similar Jobs
Roku
Software Engineer Intern, AI-Powered Picture Quality Tools
Roku
Sr Manager, SW Engineering - ML
Coupang
Senior Staff Machine Learning Infrastructure Engineer – Search & Discovery
CoreWeave
Staff, Data Center Augmentation Engineer
Waymo
Principal Software Engineer, ML System Architect
Cresta