Back

Manager, Software Engineering - AIOps

NVIDIANVIDIA·Semiconductors

Apply effort

<60 sec

via Aplyr Quick Apply

Posted

4 days

01

About the role

NVIDIA is at the forefront of the AI revolution, and the AIOps department is critical to ensuring our AI-driven data centers operate with unmatched efficiency. We are looking for a visionary, hands-on Software Engineering Manager to lead a team building the next generation of AI-based monitoring and operation platforms.

This role focuses on leveraging AI Agents to automate, predict, and optimize data center performance at an internet scale. If you are a resilient leader who excels in fast-paced environments and has a passion for autonomous system operations, we want you on our team.

What You’ll Be Doing:

  • Strategic Roadmap Development: Define software design and implementation roadmaps for AI-driven operations, ensuring data center availability, resiliency, and performance through autonomous agent-based monitoring.

  • Innovative AIOps Engineering: Lead the development of tools and proof-of-concepts focused on software-defined operations, utilizing AI agents to automate root cause analysis and proactive remediation.

  • Scalable Architecture: Build and scale monitoring applications that handle massive telemetry data from AI infrastructure across public, private, and hybrid cloud environments.

  • Agentic Frameworks: Oversee the integration of LLM-based agents into CI/CD and operational workflows to shift from reactive monitoring to predictive orchestration.

  • Team Leadership: Actively hire, mentor, and grow a high-performing engineering team, fostering a culture of technical excellence and creative problem-solving.

  • Customer Engagement: Directly contribute to internal and external customer engagements to align AIOps solutions with real-world data center challenges.

What We Need to See:

  • BS/MS degree in Computer Science or a related technical field (or equivalent experience).

  • 8+ years of overall software engineering experience, with at least 2+ years in a management or technical lead role.

  • Domain Expertise: 3+ years of experience in system software engineering for large-scale production systems, with a strong background in Solution Design and Distributed Systems.

  • Cloud Native Mastery: Deep experience with Docker and Kubernetes orchestration, alongside PaaS or IaaS cloud platforms.

  • Programming Proficiency: Strong programming skills in Python (essential for AI/ML workflows) and Go.

  • Operational Intelligence: Extensive knowledge of CI/CD pipelines and automated software-defined operations.

  • Exceptional written and verbal communication skills to bridge the gap between complex AI logic and operational requirements.

Ways to Stand Out from the Crowd:

  • AI/ML Background: Experience building or deploying AI Agents (LangChain, AutoGPT) or using ML models for anomaly detection and predictive analytics.

  • Infrastructure Knowledge: Familiarity with Ethernet switching, networking protocols, or NVIDIA’s hardware stack (GPUs/DPUs).

  • Control Systems: Experience in developing autonomous systems or closed-loop feedback monitoring tools.

  • SaaS Background: Proven track record of managing and scaling cloud-based SaaS applications.

Skills & Tags

02

Aplyr's read

NVIDIA is a pioneering force in GPUs and AI, attracting top talent in engineering and innovation-driven roles across various tech domains.

Synthesized from recent postings & public sources

What's promising

  • NVIDIA leads the GPU market, crucial for gaming and AI applications.
  • The company invests heavily in AI and deep learning, driving technological advancements.
  • NVIDIA's strong market position offers stability and growth opportunities for employees.

What to watch

  • High competition in the semiconductor industry can impact market share.
  • Rapid technological changes require constant adaptation and learning.
  • Intense workload and high expectations may affect work-life balance.

Why NVIDIA

  • NVIDIA's GPUs are industry benchmarks in gaming and professional graphics.
  • The company's AI research is at the forefront of deep learning innovation.
  • NVIDIA's culture emphasizes cutting-edge technology and engineering excellence.

Aplyr’s read is generated by AI from public sources. Was it useful?

03

About NVIDIA

NVIDIA is a leading technology company known for its graphics processing units (GPUs) for gaming and professional markets, as well as its advancements in artificial intelligence and deep learning.

04

Similar roles