Back to Search
Overview
Principal

Principal MLOps Engineer

Confirmed live in the last 24 hours

Raft

Raft

Remote, US; DMV; McLean, VA; Boston, MA; San Antonio, TX; Colorado Springs, CO; Tampa, FL; Honolulu, HI
Hybrid
Posted April 17, 2026

Job Description

This is a U.S. based position. All of the programs we support require U.S. citizenship to be eligible for employment. All work must be conducted within the continental U.S.

Who we are:

Raft (https://TeamRaft.com) is a customer-obsessed non-traditional defense tech company dedicated to empowering U.S. military and government agencies with cutting-edge AI/ML and data solutions. We are a leader in autonomous data fusion and Agentic AI, with a purposeful focus on Distributed Data Systems, Platforms at Scale, and Complex Application Development. With headquarters in McLean, VA, our range of clients includes innovative federal and public agencies leveraging design thinking, cutting-edge tech stack, and cloud-native ecosystem. We build digital solutions that impact the lives of millions of Americans.

We’re looking for an experienced Principal ML Ops Engineer to support our customers and join our passionate team of high-impact problem solvers.

About the role:

Raft is building mission-critical AI and data platforms for the Department of Defense (DoD). Our systems ingest and process massive volumes of real-time data from hundreds of sensors and operational sources, transform that data into usable intelligence, and deliver it to operators through mission applications and common operational pictures that support time-sensitive decision-making.

Our platform operates at scale, processing billions of events per day with low-latency data pipelines and cloud-native infrastructure. As Raft expands its AI capabilities, we are investing in a more mature end-to-end machine learning platform to support model development, evaluation, deployment, monitoring, and lifecycle management across both cloud and constrained operational environments.

In this role, you will help design, deploy, and mature Raft’s ML platform and MLOps infrastructure. You will work across Kubernetes-based deployment environments, GPU-enabled infrastructure, model serving systems, CI/CD pipelines, and secure production operations to enable rapid and reliable delivery of machine learning capabilities. This role is ideal for someone who understands both the infrastructure needed to run ML systems in production and the practical needs of ML engineers building and deploying models.

What you’ll do:

  • Design, build, and maintain secure, scalable MLOps infrastructure and deployment pipelines for production ML systems
  • Help mature Raft’s internal ML platform and model lifecycle capabilities, including model packaging, registry/catalog workflows, deployment, monitoring, and operational support
  • Deploy and manage machine learning workloads on Kubernetes, including GPU-enabled clusters
  • Support model serving and inference infrastructure for a range of ML use cases, including traditional ML, computer vision, speech/audio, and LLM-based systems
pythongoawsazurekubernetesdockermachine learningaidevopsdata