About the role

Aplyr's Quick Take

This role is focused on optimizing and building distributed training systems for machine learning models, specifically using PyTorch. You'll be hands-on with low-level coding and performance tuning, working primarily as an individual contributor rather than in a managerial capacity.

Good fit

Ideal candidates have over 8 years of experience in distributed systems or high-performance computing, with strong skills in Python and low-level optimizations. A system-level mindset and a proactive approach to problem-solving will help you thrive here.

Worth noting

The role demands deep technical expertise, particularly in CUDA and GPU optimization, which may be a high bar for some applicants. The focus on performance tuning suggests a fast-paced environment with potentially high expectations.

What You’ll Do

Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels
Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization
Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks
Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking
Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures

What You’ll Bring

Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years)
Production-grade expertise in Python
Low-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimization
Scaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelism
System-level mindset with a track record of tuning hardware–software interactions for maximum utilization

Skills & Tags

node python ai data product design

Aplyr's read

Genesis AI is a cutting-edge AI company focusing on foundational and applied AI technologies, attracting top technical talent for its global operations.
Synthesized from recent postings & public sources

What's promising

•Genesis AI is at the forefront of AI innovation, specializing in foundational models and robotics.
•The company offers diverse roles across global tech hubs, indicating robust growth and international expansion.
•Genesis AI's focus on foundational models suggests a commitment to long-term technological leadership.

What to watch

•Limited public information about company culture and employee satisfaction could concern potential applicants.
•High concentration of technical roles may limit opportunities for non-technical professionals.
•The competitive AI industry landscape poses challenges for sustained differentiation and market share.

Why Genesis AI

•Genesis AI's emphasis on foundational models and robotics sets it apart in the AI sector.
•The company has a strong international presence, with roles in major tech cities like Paris, London, and the Bay Area.
•Genesis AI's focus on both foundational and applied AI technologies provides diverse career pathways.

Aplyr’s read is generated by AI from public sources. Was it useful?