About the role
Microsoft Azure Artificial Intelligence/High Performance Computing (AI/HPC) team is looking for systems engineers, architects and thought leaders to enable customers in deploying, monitoring, profiling, and debugging their applications on hyperscale cloud infrastructure. Azure is enabling the largest supercomputing deployments to tackle complex computational problems in public cloud, evident from the various HPC products that have already made the mark on Top500, MLPerf and Graph500 rankings.
The AI Systems Architect will work on all aspects of inference and training systems focusing on system design, data center planning and modeling of workloads on multiple GPU SKUs. The work will entail reliability modeling, lifecycle modeling and analysis of workloads and GPUs, GPU planning, analytical design of systems and workload assignment. The successful candidate will focus on deep LLM modeling and disseminating results across cross functional orgs towards better understanding of software-hardware codesign features. The candidate must be able to demonstrate deep knowledge of AI systems and architectures for both training and inference SKUs across mult-vendor and multi-generational GPUs and models.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
- Partners with appropriate stakeholders to determine user requirements for a set of scenarios.
- Leads identification of dependencies and the development of design documents for a product, application, service, or platform.
- Leads by example and mentors others to produce extensible and maintainable code used across products.
- Leverages subject-matter expertise of cross-product features with appropriate stakeholders (e.g., project managers) to drive multiple group's project plans, release plans, and work items.
- Holds accountability as a Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions.
- Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale and shares knowledge with other engineers.
Qualifications
Required Qualifications:
- Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
Other Qualifications:
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
- 5+ years of experience in the system design space.
Software Engineering IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Skills & Tags
Aplyr's read
Microsoft is a technology giant shaping global computing, attracting talent interested in innovation and impact across software, cloud, and AI sectors.
What's promising
- •Strong commitment to innovation in AI and cloud computing.
- •Diverse range of roles from technical to managerial, offering career growth.
- •Financially stable with a robust global presence and market influence.
What to watch
- •Complex organizational structure can slow decision-making processes.
- •High competition for roles may limit entry-level opportunities.
- •Potential for work-life balance challenges in high-demand positions.
Why Microsoft
- •Pioneer in personal and business computing with a legacy of innovation.
- •Extensive global reach with a diverse workforce and inclusive culture.
- •Leader in integrating AI into mainstream software and cloud services.
Aplyr’s read is generated by AI from public sources. Was it useful?
About Microsoft
Microsoft is a global technology company that develops, licenses, and supports a wide range of software products, services, and devices. Known for its Windows operating system and Office productivity suite, Microsoft has a significant impact on personal and business computing worldwide.
- Founded
- 1975
Similar roles
Advanced Technology: R&D Engineer - AI/ML, HPC
Cerebras Systems
HPC & AI Systems Engineer for Integrated Systems Test
HPE
HPE HPC & AI Systems Engineer for Integrated Systems Test
HPE
Member of Technical Staff, Software Co-Design AI HPC Systems - MAI Superintelligence Team
Microsoft
Member of Technical Staff, Software Co-Design AI HPC Systems - MAI Superintelligence Team
Microsoft