Back
Verified active · 17h ago

Member of Technical Staff - GPU Infrastructure

Prime IntellectPrime Intellect·Artificial Intelligence

Compensation

$150 - 300K

Apply effort

~7 min

Ashby

Posted

101 days

01

About the role

Building Open Superintelligence Infrastructure

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full rl post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts.

As our Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms customer requirements into production-ready systems capable of training the world's most advanced AI models.

We recently raised $15mm in funding (total of $20mm raised) led by Founders Fund, with participation from Menlo Ventures and prominent angels including Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao (Chief Scientific Officer of Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), Emad Mostaque (Stability AI) and many others.

Core Technical Responsibilities

This customer-facing role combines deep technical expertise with hands-on implementation. You'll be instrumental in:

Customer Architecture & Design

  • Partner with clients to understand workload requirements and design optimal GPU cluster architectures

  • Create technical proposals and capacity planning for clusters ranging from 100 to 10,000+ GPUs

  • Develop deployment strategies for LLM training, inference, and HPC workloads

  • Present architectural recommendations to technical and executive stakeholders

Infrastructure Deployment & Optimization

  • Deploy and configure orchestration systems including SLURM and Kubernetes for distributed workloads

  • Implement high-performance networking with InfiniBand, RoCE, and NVLink interconnects

  • Optimize GPU utilization, memory management, and inter-node communication

  • Configure parallel filesystems (Lustre, BeeGFS, GPFS) for optimal I/O performance

  • Tune system performance from kernel parameters to CUDA configurations

Production Operations & Support

  • Serve as primary technical escalation point for customer infrastructure issues

  • Diagnose and resolve complex problems across the full stack - hardware, drivers, networking, and software

  • Implement monitoring, alerting, and automated remediation systems

  • Provide 24/7 on-call support for critical customer deployments

  • Create runbooks and documentation for customer operations teams

Technical Requirements

Required Experience

  • 3+ years hands-on experience with GPU clusters and HPC environments

  • Deep expertise with SLURM and Kubernetes in production GPU settings

  • Proven experience with InfiniBand configuration and troubleshooting

  • Strong understanding of NVIDIA GPU architecture, CUDA ecosystem, and driver stack

  • Experience with infrastructure automation tools (Ansible, Terraform)

  • Proficiency in Python, Bash, and systems programming

  • Track record of customer-facing technical leadership

Infrastructure Skills

  • NVIDIA driver installation and troubleshooting (CUDA, Fabric Manager, DCGM)

  • Container runtime configuration for GPUs (Docker, Containerd, Enroot)

  • Linux kernel tuning and performance optimization

  • Network topology design for AI workloads

  • Power and cooling requirements for high-density GPU deployments

Nice to Have

  • Experience with 1000+ GPU deployments

  • NVIDIA DGX, HGX, or SuperPOD certification

  • Distributed training frameworks (PyTorch FSDP, DeepSpeed, Megatron-LM)

  • ML framework optimization and profiling

  • Experience with AMD MI300 or Intel Gaudi accelerators

  • Contributions to open-source HPC/AI infrastructure projects

Growth Opportunity

You'll work directly with customers pushing the boundaries of AI, from startups training foundation models to enterprises deploying massive inference infrastructure. You'll collaborate with our world-class engineering team while having direct impact on systems powering the next generation of AI breakthroughs.

We value expertise and customer obsession - if you're passionate about building reliable, high-performance GPU infrastructure and have a track record of successful large-scale deployments, we want to talk to you.

Apply now and join us in our mission to democratize access to planetary scale computing.

Compensation

Cash Compensation Range of $150-300k plus Equity Incentives

02

Aplyr's read

Prime Intellect is a cutting-edge AI company attracting talent interested in enhancing human decision-making through technology. Ideal for those passionate about advanced AI solutions.

Synthesized from recent postings & public sources

What's promising

  • Prime Intellect focuses on developing AI that enhances human decision-making, showing commitment to impactful technology.
  • The company hires across diverse roles, indicating growth and a broad scope of AI projects.
  • Open applications for unconventional talent suggest a culture open to innovative ideas and diverse backgrounds.

What to watch

  • Limited public information about company culture and work-life balance may concern potential applicants.
  • Rapid growth might lead to challenges in maintaining a cohesive company culture.
  • The niche focus on AI could limit opportunities for those interested in broader tech roles.

Why Prime Intellect

  • Prime Intellect's emphasis on AI for enhancing human capabilities sets it apart in the tech landscape.
  • The company's diverse hiring strategy, including roles in AI infrastructure and legal, highlights its comprehensive approach.
  • Offering a role for unconventional talent indicates a unique openness to non-traditional career paths.

Aplyr’s read is generated by AI from public sources. Was it useful?

03

About Prime Intellect

Prime Intellect is a technology company focused on developing advanced AI solutions to enhance human capabilities and decision-making processes.

04

Similar roles