About the role
Who We Are
Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction.
Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.
We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.
What You'll Do
- At the direction of the Manager of Infrastructure Operations, design, build, and roll out new platforms and patterns to minimize incidents and enable customer facing and internal features.
- Deploy updates and improvements to support both Voltage Park’s internal and end customer use cases.
- Collaborate with colleagues in Infrastructure Engineering, Network Operations, Customer Success and Software and Platform Development Teams.
- Participate in the on-call rotation which is evenly distributed across all team members in a primary / secondary pattern where you are primary then move to a secondary position.
What You Will Need
Required Qualifications
- 8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience.
- 5+ years experience with AWS.
- 2+ years experience with Kubernetes and strong container fundamentals.
- 2+ years experience with Terraform and Ansible
- 2+ years with network attached storage management (via NFS, ceph, or other protocols). Extra points for experience with VAST storage systems.
- Experience with monitoring systems (Prometheus, ELK stack).
- Familiarity with the gitops workflow.
- Software development experience using Python, Go, bash, or other languages for the purposes of automation & connecting systems & APIs together.
- Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband.
- Experience building and delivering complex systems.
- Effective at navigating tradeoffs between design, risk, cost, and outcomes.
- Comfortable with navigating ambiguity.
- Strong written and oral communication.
Nice-to-Haves
- Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware.
- Experience with GPU servers, both in bare metal form or under virtualization.
- Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls and Juniper Networks as vendors.
- Experience with VAST storage systems
Compensation
We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles includes a discretionary bonus, a meaningful equity component, and comprehensive benefits.
Benefits and Perks
We offer a comprehensive and competitive benefits package designed to support our employees’ health, well-being, and long-term success. Benefits may vary by location, team, and role.
Benefits include:
- Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
- Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
- Generous paid time off, plus holidays
- Paid parental leave
- Professional development support
- Wellness and work-from-home stipends
- Flexible work environment
At Lightning AI, we are committed to fostering an inclusive and diverse workplace. We believe that diverse teams drive innovation and create better products. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. We are dedicated to building a culture where everyone can thrive and contribute to their fullest potential.
Aplyr's read
Lightning AI empowers developers with cutting-edge tools to streamline AI model creation, attracting tech-savvy professionals passionate about AI innovation.
What's promising
- •Lightning AI provides a comprehensive platform for AI model development, reducing complexity for developers.
- •The company is at the forefront of making AI accessible to businesses of all sizes.
- •Recent hiring trends indicate a focus on expanding technical and security capabilities.
What to watch
- •The AI market is highly competitive, posing challenges for differentiation.
- •Rapid technological changes require constant adaptation and innovation.
- •Limited public information about company culture and employee satisfaction.
Why Lightning AI
- •Lightning AI focuses on streamlining the AI development process, a niche not all competitors address.
- •The platform's infrastructure supports diverse AI applications, enhancing versatility.
- •Strong emphasis on security and observability in AI infrastructure sets it apart.
Aplyr’s read is generated by AI from public sources. Was it useful?
About Lightning AI
Lightning AI is a technology company that specializes in providing tools and infrastructure for building and deploying AI applications. Their platform enables developers to streamline the process of creating machine learning models, making AI more accessible and efficient for businesses.
Similar roles
Senior Platform and EngOps Engineer - Cluster Operations
NVIDIA
Senior Site Reliability Engineer (SRE) – Operations
SS&C
Senior Backend Engineer - Alerts and Operations
Verkada
Backend Engineer - Alerts and Operations
Verkada
Lead Infrastructure Engineer - Middleware Messaging Operations
Wells Fargo
AI Systems Operations Engineer Enterprise Video Services, Wello, and Brand & Sponsorship Platforms
Wells Fargo