About the role

Who We Are

Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world. With 33 global cloud data center locations, Vultr is trusted by hundreds of thousands of active customers across 185 countries for its flexible, scalable, global Cloud Compute, Cloud GPU, Bare Metal, and Cloud Storage solutions. In December 2024 Vultr announced an equity financing at a $3.5 billion valuation. Founded by David Aninowsky and self-funded for over a decade, Vultr has grown to become the world’s largest privately-held cloud infrastructure company.

Vultr Cares

100% company-paid insurance premiums for employee medical, dental and vision plans.
401(k) plan that matches 100% up to 4%, with immediate vesting
Professional Development Reimbursement of $2,500 each year
11 Holidays + Paid Time Off Accrual + Rollover Plan
Commitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
$500 stipend for remote office setup in first year + $400 each following year
Internet reimbursement up to $75 per month
Gym membership reimbursement up to $50 per month
Company paid Wellable subscription

Join Vultr

Vultr is seeking a highly skilled and experienced RMA Systems Engineer to analyze, diagnose, and resolve complex hardware and system level failures across our cloud infrastructure platform. This role is responsible for evaluating failure patterns across GPU, CPU, and server environments, determining appropriate remediation strategies, and improving how Vultr manages hardware lifecycle and vendor interactions. The ideal candidate brings strong systems-level troubleshooting expertise and the ability to analyze non-routine technical issues, determine root cause, and make informed decisions on repair, replacement, or escalation paths.

This is a highly visible role that partners closely with Infrastructure, Engineering, and vendor teams to ensure system reliability and continuous improvement of Vultr’s RMA processes. It will also require the ability to adapt to the latest technology, help drive resolutions for never before seen issues and develop documentation as the technology evolves. This is your opportunity to join our fast growing team and leave your mark on Vultr and the future of Cloud Infrastructure.

Key Responsibilities

Analyze GPU/CPU hardware failures by evaluating system logs, performance data, firmware behavior, and workload interactions to determine root cause.
Perform advanced troubleshooting across hardware, firmware, and operating system layers to validate faults and identify appropriate remediation paths.
Serve as a technical liaison with hardware vendors, providing detailed diagnostic data, interpreting vendor responses, and influencing resolution strategies.
Determine and coordinate on-site hardware remediation activities based on technical analysis, system impact, and priority, including guiding vendors and internal teams on appropriate repair or replacement actions.
Evaluate software and firmware versions to identify compatibility issues or contributing factors to system failures.
Validate and certify system stability and readiness prior to returning hardware to production environments.
Identify recurring failure patterns and recommend improvements to RMA workflows, vendor processes, and hardware lifecycle management.
Create and maintain detailed technical documentation, including failure analysis, troubleshooting methodologies, and resolution outcomes.
Manage and contribute to technical case documentation within vendor portals, ensuring accuracy, completeness, and alignment with diagnostic findings.

Qualifications

Experience using Jira to manage, track, and document technical issues, RMA cases, and vendor escalations, including maintaining accurate and audit-ready case records.
Experience diagnosing and troubleshooting complex hardware and system-level issues in a data center or infrastructure environment.
Experience in RMA workflows, hardware lifecycle management, or infrastructure support.
Hands-on experience with NVIDIA and/or AMD GPU technologies.
Familiarity with analyzing system logs, firmware behavior, and performance metrics to determine root cause.
Experience working with hardware vendors and managing technical escalations or support cases.
Strong understanding of server hardware, system architecture, and data center operations.
Ability to analyze technical issues, apply judgment, and determine appropriate resolution paths.
Strong written and verbal communication skills, particularly in documenting technical findings and collaborating with cross-functional teams.
Experience with tools such as JIRA, Confluence, and vendor support portals.
Experience working with microcloud or distributed infrastructure environments, including understanding of system architecture and hardware integration.
Experience supporting or analyzing systems within data center environments, including hardware lifecycle, performance, and reliability considerations.

Compensation

$75,000 - $90,000

Final compensation will vary depending on years of experience, background/skill set, location, and applicable laws.

[We are currently accepting applications from candidates residing in the following states: Alabama, Arizona, Colorado, Connecticut, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Kentucky, Louisiana, Maryland, Massachusetts, Michigan, Minnesota, Missouri, Montana, Nebraska, Nevada, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Pennsylvania, Rhode Island, South Carolina, Tennessee, Texas, Utah, Vermont, Virginia, Wisconsin.]

Inclusion & Privacy

We are an equal opportunity employer and are committed to creating an inclusive environment for all employees. We welcome applications from individuals of all backgrounds and experiences, and we prohibit discrimination based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected status under applicable laws. Vultr will consider qualified applicants with arrest or conviction records in accordance with applicable laws and will not conduct a background check until after an offer of employment has been extended and accepted.

We also take your privacy seriously. We handle personal information responsibly and follow applicable laws, including U.S. privacy rules and India’s Digital Personal Data Protection Act, 2023. Your data is used only for legitimate business purposes and is protected with proper security measures.

Where allowed by law, applicants may request details about the data we collect, access or delete their information, withdraw consent for its use, and opt out of nonessential communications. For more details, please see our Privacy Policy.

Skills & Tags

rust aws ai data product

Aplyr's read

Vultr is a nimble cloud infrastructure provider attracting tech-savvy professionals who thrive in high-performance computing environments.
Synthesized from recent postings & public sources

What's promising

•Vultr offers competitive pricing, making it attractive for startups and small businesses.
•The company is expanding its global data center presence, enhancing service availability.
•Vultr's focus on high-performance computing appeals to tech enthusiasts and developers.

What to watch

•Limited public information about Vultr's company culture and employee satisfaction.
•The cloud market is highly competitive, posing growth challenges for Vultr.
•Vultr's smaller scale compared to giants like AWS may impact large enterprise adoption.

Why Vultr

•Vultr provides a wide range of server locations, offering flexibility for global deployments.
•The company emphasizes simplicity and speed in deploying cloud infrastructure.
•Vultr's pay-as-you-go model provides financial flexibility for users.

Aplyr’s read is generated by AI from public sources. Was it useful?

About Vultr

Vultr

vultr.com

View company

Vultr is a cloud infrastructure provider that offers high-performance cloud computing solutions, including virtual private servers, block storage, and dedicated instances.