Principal Software Engineering Manager - AI Frameworks
Confirmed live in the last 24 hours
Microsoft
Compensation
$139,900 - $304,200/year
Job Description
As a Principal Software Engineering Manager - AI Frameworks on the team, you will lead and grow a group of engineers working across multiple layers of the AI software serving stack, including fundamental abstractions, runtimes, libraries, and application programming interfaces (APIs). You will be responsible for setting technical direction, prioritizing investments, and ensuring the team delivers high-impact performance improvements that enable large-scale model training and inference.
In this role, you will guide the team’s work on benchmarking OpenAI and other large language models (LLMs) across GPUs and Microsoft hardware, driving performance optimization, monitoring regressions, and accelerating time-to-deployment. You will partner closely with researchers, product teams, and platform owners to translate performance insights into production-ready improvements that reduce hardware footprint and support Microsoft Azure’s capex efficiency goals.
Responsibilities
- Lead and develop a team of engineers working across multiple layers of the AI software stack to enable large-scale training and inference.
- Set technical vision and execution strategy for model performance benchmarking, optimization, and deployment across GPUs and Microsoft hardware.
- Drive performance outcomes by prioritizing and overseeing efforts to benchmark, profile, debug, and optimize training and inference workloads.
- Own performance health by establishing mechanisms to monitor regressions, measure impact, and continuously improve time-to-deploy and hardware efficiency.
- Partner cross-functionally with research, product, infrastructure, and hardware teams to deliver scalable, production-ready AI performance improvements.
- Balance short-term delivery and long-term investments, ensuring the team’s work aligns with organizational goals, platform roadmaps, and Azure capex objectives.
- Build a strong engineering culture through coaching, feedback, hiring, and career development, enabling the team to operate with increasing autonomy and impact.
Qualifications
Minimum/Required Qualifications:
- Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Preferred:
- Master’s Degree in Computer Science or related technical field AND 10+ years of software engineering experience, including 6+ years in engineering management,
OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years of software engineering experience, including 6+ years in engineering management, or equivalent experience. - Strong technical foundation in software engineering principles, computer architecture, GPU architecture, and hardware acceleration for neural networks, with the ability to guide teams working in these areas.
- Experience leading teams responsible for end-to-end performance analysis and optimization of LLMs, AI systems, or HPC workloads, including use of GPU profiling and performance analysis tools.
- Demonstrated ability to lead cross-team initiatives, align stakeholders, and translate research or platform capabilities into scalable, production-ready solutions.
- Proven people leadership skills, including hiring, coaching, performance management, and career development, with a track record of building high-performing, inclusive teams.
- Exposure to AI / ML infrastructure, including DNN or LLM training and/or inference systems, and experience with at least one modern deep learning framework (e.g., PyTorch, TensorFlow, ONNX Runtime).
- Familiarity with GPU software stacks and acceleration technologies such as CUDA, ROCm, Triton, or equivalent, sufficient to guide technical direction and evaluate tradeoffs.Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
#AIInfra
Software Engineering M5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar Jobs
Robinhood
Software Engineering Manager, Growth
Tenable
Manager, Software QA Engineering & Reliability
Roku
Lead Systems Software Architect
Roku
Lead Systems Software Architect
SolarWinds
Software Engineering Manager - eCommerce Platform
SolarWinds