Back to Search
Overview
Staff

Staff ML Engineer

Confirmed live in the last 24 hours

Buildkite

Buildkite

ANZ Region
Remote
Posted April 30, 2026

Job Description

About Buildkite

Buildkite's CI platform is trusted by the world's leading engineering teams, shipping software to over 1,000,000,000 daily users.

Job Overview

We're hiring a Staff Engineer (ML) to join our Test Engine team. In this role, you'll define and lead the technical strategy for machine learning within Test Engine — specifically, building the models and infrastructure behind predictive test selection: using code changes to determine which tests actually need to run.

Staff Engineers at Buildkite are hands-on technical leaders. You'll influence how we design, build, and scale systems while supporting other engineers to deliver their best work. You'll be the most senior ML practitioner in the company, setting the technical direction for how we approach test selection and establishing the patterns and infrastructure that the broader ML effort builds on.

About the Team

The Test Engine team helps engineering teams ship faster by giving them visibility and control over their test suites. Today, that means real-time flaky test detection and management, intelligent test splitting across parallel jobs, and performance analytics and tracing — all working across any CI/CD platform, not just Buildkite Pipelines.

Test Engine already ingests billions of test runs. We have deep visibility into test suites, codebases, and the relationships between them. The next step is using that data to answer a fundamental question: for a given code change, which tests are most likely to fail?

We believe the industry is moving away from running full test suites on every change. The teams that can shift their outer testing loop into a fast, precise inner loop — running only the tests that matter — will ship value to their customers dramatically faster. For many of our customers, that speed is existential. Switching costs are low, competition is fierce, and the teams with faster feedback loops win.

This is where ML comes in. If we can model the relationship between code changes and test failures, we can give engineering teams a fundamentally faster development cycle. We're not trying to optimise individual tests — we're trying to build a generalised solution to test selection that works across codebases, frameworks, and languages.

What You'll Do

Own Technical Direction for ML in Test Engine

  • Lead and define the ML strategy for predictive test selection — from early experimentation through to models running reliably in production at scale
  • Lead the technical investigation into how we build a generalised test selection model, and shape the approach based on what the data tells you
  • Lead the design of the ML architecture end-to-end: feature engineering from code changes and test history, model training and evaluation, serving infrastructure, and feedback loops for continuous improvement
  • Drive key decisions around model operationalisation — latency constraints (test selection has to be fast enough to sit in the critical path), prediction accuracy trade-offs, and graceful degradation when confidence is low
  • Shape how ML capabilities integrate with Test Engine's existing data infrastructure — billions of ingested test runs, test-to-code mapping, and the intelligent splitting engine

Build and Scale the ML Platform

  • Build the ML platform layer so that getting a model into production is fast and repeatable
  • Design, build, and maintain the data pipelines that feed ML workloads — connecting code change signals with test execution history at scale
  • Train, evaluate, and deploy models, taking ownership through to monitoring and retraining in production
  • Instrument production models with observability metrics: prediction accuracy, latency, coverage, false negative rates, and drift detection
  • Solve the hardest technical challenges at the intersection of code analysis and test data — feature extraction from diffs, generalisation across languages and frameworks, and handling the cold-start problem for new tests and repositories

Lead and Unblock

  • Investigate and resolve complex performance and reliability issues across the data and ML stack
  • Share knowledge and drive engineering best practices across teams through documentation, mentorship, and pairing
  • Support the wider engineering organisation by contributing to cross-team tooling, infrastructure, and frameworks
  • Communicate trade-offs effectively and build alignment around technical decisions
  • Work closely with customers to understand how test selection fits into their development workflows, and ensure the product delivers real impact

Skills & Experience We Value

Technical Expertise

  • Deep proficiency in Python, with strong experience building production ML systems end-to-end
  • Proven experience designing and operating ML infrastructure at scale — model registries, feature stores, serving layers, experiment tracking, or similar
  • Strong experience with data processing at scale — whether batch or streaming frameworks (Spark, Flink, or similar)
  • Deep proficiency in SQL
  • Comfort working in cloud environments (AWS) and with containerised workloads (Docker, Kubernetes)
  • In short, we'd expect equal comfort and high level capability in the end to end process from designing and building models through to deploying them.

ML & Domain Experience

  • Hands-on experience training, evaluating, and deploying ML models in production — you're a practitioner, not only an infrastructure builder
  • Experience with classification, ranking, or prediction problems where the signal-to-noise ratio is challenging — test selection shares characteristics with anomaly detection, change-point detection, and predictive filtering
  • Track record of building ML capabilities that scaled beyond a single use case — not just one-off models but repeatable, generalised approaches
  • Experience with feature engineering from structured and semi-structured data (code diffs, execution logs, dependency graphs, or similar)
  • Experience instrumenting production models with observability: accuracy, latency, coverage, drift

Collaboration and Communication

  • Excellent written and verbal communication skills, especially in a remote-first environment
  • Ability to distil complex technical concepts into clear explanations for diverse audiences
  • A collaborative, pragmatic mindset — balancing technical quality with business context
  • Comfortable mentoring engineers and leading technical discussions across teams
  • Proven ability to build alignment across teams and influence technical direction without authority

Nice to Have

  • Experience with code analysis, static analysis tools, or building features from source code structure
  • Familiarity with CI/CD systems, developer tooling, or test infrastructure
  • Experience with Ruby on Rails, React, GraphQL, or Go
  • Background in search ranking, recommendation systems, or other domains where you're predicting relevance from sparse signals
  • Experience working with test frameworks or test execution data

✨ Why Join Buildkite

At Buildkite, we value kindness, autonomy, and collaboration. You'll be joining a remote-first company where your work directly helps some of the world's best engineering teams build and ship software faster and more safely.

  • Competitive compensation, including salary, equity, and benefits package
  • Flexible, remote-first culture (Remote in the ANZ & PST Regions)
  • Meaningful technical challenges at scale
  • Opportunities for professional growth, technical leadership, and cross-team influence
  • A collaborative, inclusive, and innovative culture where your ideas make a real impact

Equal Opportunity Employer

At Buildkite, we value diversity and celebrate all types of skills, backgrounds, and experiences. We’re dedicated to fostering an inclusive environment and providing reasonable accommodations throughout our recruitment process.

If you need any accommodations or support during the application or interview process, please reach out to us at accommodations@buildkite.com.

reactpythongorustawskubernetesdockermachine learningaidata