Staff ML Engineer

External

Buildkite · Anz Region

Full-timeOn-site1mo ago30+ days old, may be filled

CI/CDDocumentationFeature EngineeringMachine LearningObservabilityPython

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

About Buildkite Buildkite's CI platform is trusted by the world's leading engineering teams, shipping software to over 1,000,000,000 daily users. Job Overview We're hiring a Staff Engineer (ML) to join our Test Engine team. In this role, you'll define and lead the technical strategy for machine learning within Test Engine - specifically, building the models and infrastructure behind predictive test selection: using code changes to determine which tests actually need to run. Staff Engineers at Buildkite are hands-on technical leaders. You'll influence how we design, build, and scale systems while supporting other engineers to deliver their best work. You'll be the most senior ML practitioner in the company, setting the technical direction for how we approach test selection and establishing the patterns and infrastructure that the broader ML effort builds on. 🔧 About the Team The Test Engine team helps engineering teams ship faster by giving them visibility and control over their test suites. Today, that means real-time flaky test detection and management, intelligent test splitting across parallel jobs, and performance analytics and tracing - all working across any CI/CD platform, not just Buildkite Pipelines. Test Engine already ingests billions of test runs. We have deep visibility into test suites, codebases, and the relationships between them. The next step is using that data to answer a fundamental question: for a given code change, which tests are most likely to fail? We believe the industry is moving away from running full test suites on every change. The teams that can shift their outer testing loop into a fast, precise inner loop - running only the tests that matter - will ship value to their customers dramatically faster. For many of our customers, that speed is existential. Switching costs are low, competition is fierce, and the teams with faster feedback loops win. This is where ML comes in. If we can model the relationship between code changes and test failures, we can give engineering teams a fundamentally faster development cycle. We're not trying to optimise individual tests - we're trying to build a generalised solution to test selection that works across codebases, frameworks, and languages. 🚀 What You'll Do Own Technical Direction for ML in Test Engine Lead and define the ML strategy for predictive test selection - from early experimentation through to models running reliably in production at scale Lead the technical investigation into how we build a generalised test selection model, and shape the approach based on what the data tells you Lead the design of the ML architecture end-to-end: feature engineering from code changes and test history, model training and evaluation, serving infrastructure, and feedback loops for continuous improvement Drive key decisions around model operationalisation - latency constraints (test selection has to be fast enough to sit in the critical path), prediction accuracy trade-offs, and graceful degradation when confidence is low Shape how ML capabilities integrate with Test Engine's existing data infrastructure - billions of ingested test runs, test-to-code mapping, and the intelligent splitting engine Build and Scale the ML Platform Build the ML platform layer so that getting a model into production is fast and repeatable Design, build, and maintain the data pipelines that feed ML workloads - connecting code change signals with test execution history at scale Train, evaluate, and deploy models, taking ownership through to monitoring and retraining in production Instrument production models with observability metrics: prediction accuracy, latency, coverage, false negative rates, and drift detection Solve the hardest technical challenges at the intersection of code analysis and test data - feature extraction from diffs, generalisation across languages and frameworks, and handling the cold-start problem for new tests and repositories Lead and Unblock Investigate and resolve complex performance and reliability issues across the data and ML stack Share knowledge and drive engineering best practices across teams through documentation, mentorship, and pairing Support the wider engineering organisation by contributing to cross-team tooling, infrastructure, and frameworks Communicate trade-offs effectively and build alignment around technical decisions Work closely with customers to understand how test selection fits into their development workflows, and ensure the product delivers real impact 🎨 Skills & Experience We Value Technical Expertise Deep proficiency in Python, with strong experience building production ML systems end-to-end Proven experience designing and operating ML infrastructure at scale - model registries, feature stores, serving layers, experiment tracking, or similar Strong experience with data processing at scale - whether batch or streaming frameworks (Spark, Flink, or similar) Deep proficiency in SQL Comfort

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Buildkite? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect