About the role
About Hark
Hark is an artificial intelligence company building advanced, personalized intelligence. One that is proactive, multimodal, and capable of interacting with the world through speech, text, vision, and persistent memory.
We're pairing that intelligence with next-generation hardware to create a universal interface between humans and machines. While today's AI largely operates through chat boxes and decade-old devices, Hark is focused on what comes next: agentic systems that interact naturally with people and the real world.
To get there, we're developing multimodal models and next-generation AI hardware together - designed from the ground up as a single, unified interface for a new era of intelligent systems.
About the Role
We are looking for a Member of Technical Staff, Infrastructure Compute to lead and manage large-scale GPU computing clusters powering our AI training and deployment workloads. You'll work at the intersection of systems engineering and machine learning infrastructure, owning the reliability, scalability, and efficiency of the compute platform that our research and engineering teams depend on. This is a high-impact, highly technical role suited for someone who thrives in complex distributed systems environments and cares deeply about infrastructure as a product.
Responsibilities
- Design, implement, and maintain Infrastructure as Code (IaC) best practices to enable repeatable, auditable, and scalable cluster provisioning.
- Enhance and harden CI/CD deployment pipelines to ensure robust, secure, and low-latency model service delivery across production environments.
- Own and evolve stable training infrastructure operating at the scale of 10,000+ GPUs, including job scheduling, fault tolerance, and network fabric optimization.
- Partner closely with ML researchers and engineers to understand compute bottlenecks and translate them into infrastructure improvements.
- Monitor system health, define SLOs, and lead incident response for critical training and inference workloads.
- Drive capacity planning, cost efficiency initiatives, and hardware lifecycle management across the GPU fleet.
- Contribute to internal tooling and platform abstractions that improve developer experience for teams consuming compute resources.
Requirements
- 5+ years of experience in infrastructure, systems, or platform engineering, with at least 2 years working in ML or HPC environments.
- Demonstrated experience managing GPU clusters or large-scale distributed compute infrastructure.
- Strong proficiency in at least one systems or infrastructure programming language.
- Deep understanding of networking fundamentals (RDMA, InfiniBand, or RoCE a plus) relevant to high-throughput training workloads.
- Experience with container orchestration, job scheduling, and multi-tenant resource management.
- Proven track record owning production systems with high reliability requirements.
- Strong debugging and observability skills across the full infrastructure stack.
Bonus Qualifications
- Kubernetes (K8s) — particularly experience operating large, GPU-aware clusters.
- Pulumi or similar modern IaC tooling.
- Rust and/or Go for systems-level tooling and performance-critical services.
- Familiarity with PyTorch and Ray for understanding workload patterns and integration requirements.
Compensation
The US base salary range for this full-time position is between $180,000 - $450,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components and benefits depending on the specific role. This information will be shared if an employment offer is extended.
Skills & Tags
Aplyr's read
Hark leverages AI-driven data analytics to deliver business insights, attracting a diverse team of engineers, designers, and technical experts.
What's promising
- •Hark's focus on AI and machine learning positions it at the forefront of data-driven business solutions.
- •The company offers diverse roles, from engineering to creative social leads, indicating a broad scope of operations.
- •Hark's recent hires in specialized technical fields suggest a commitment to cutting-edge technology and innovation.
What to watch
- •The competitive landscape in AI analytics could challenge Hark's market share and growth.
- •Limited public information about Hark's financial health and long-term sustainability.
- •Potentially high-pressure environment due to the fast-paced nature of AI and tech development.
Why hark
- •Hark's integration of AI with multimodal capabilities sets it apart in data analytics.
- •The company's emphasis on both technical and creative roles highlights a balanced approach to innovation.
- •Hark's recruitment of niche technical experts suggests a focus on specialized, advanced technology solutions.
Aplyr’s read is generated by AI from public sources. Was it useful?
About hark
Hark is a data analytics platform that specializes in providing insights for businesses through the use of artificial intelligence and machine learning.
Similar roles
Sr Lead, Solutions Architect - Infrastructure, Cloud, Automation & AI Engineering
Northern Trust
Manager, Engineering | ML Infrastructure & Tooling
ExtraHop
Account Executive, AI Infrastructure Sales
Vultr
Engineering Manager, AI Models Infrastructure
Intercom
Engineering Manager, AI Models Infrastructure
Intercom
Principal Network Engineer - AI Infrastructure
CVS Health