Senior Machine Learning Engineer

External

Rubrik · Palo Alto, CA

Full-timeOn-site4d ago

LLMsRouting

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

About the Team & Role: We're building SAGE , Rubrik's Semantic AI Governance Engine, which is the first system designed to monitor, govern, and remediate autonomous AI agents in real time. SAGE powers Rubrik Agent Cloud: enterprises define governance policies in natural language, and SAGE's custom small language models act as judges on every agent action. These models are fast enough to sit in the live request path and accurate enough that customers trust them with allow/block decisions on production traffic. At its core, SAGE is "LLM-as-judge" applied to AI governance, utilizing the same technique most teams use for offline evaluation but productionized for real-time enforcement at enterprise scale. Our first-generation SLM Policy Guard already outperforms the larger frontier models we've benchmarked against on accuracy while running approximately 5x faster on the same workload. We're hiring to push that lead even further. As an Applied ML Engineer on the SAGE team, you'll work end-to-end across the model lifecycle: curating data, training small models, serving them at production latency, and closing the feedback loop with real customer signals. The models you build don't just enforce policies in the live request path; they will also drive Agent Rewind, Rubrik's capability to instantly and precisely undo destructive autonomous-agent actions and restore the affected data to a trusted state. We're a collaborative, applied team that ships models to enterprise customers within weeks, and we're passionate about proving that small, specialized models can outperform frontier LLMs at the problems that matter most for AI safety and governance. Nature of the Specialized Duties ➢ Training, Fine-Tuning, and Distilling Production Small Language Models and Classifiers (25% of time) Owning the full training lifecycle for the SLMs and classifiers in SAGE's real-time enforcement path, including base-model selection, supervised fine-tuning, preference optimization (DPO/RLAIF), and distillation from frontier teacher models. Training anomaly and action-severity models that catch novel agent-side attack patterns at real-time decision latency, such as supply-chain compromises or emergent destructive behaviors not covered by any explicit policy. Severity scores route the highest-impact events to Agent Rewind for precise remediation. Designing adversarial training pipelines like purpose-built adversarial agents and automated red-teams whose outputs feed directly into the next training run, turning every discovered weakness into a permanent model improvement. Pushing the pareto frontier of accuracy, latency, and cost for governance-specific tasks through deliberate post-training choices (LoRA, quantization-aware training, distillation recipes, GRPO, etc.) and validating the wins on production traffic patterns. ➢ Engineering High-Performance Model Serving and Inference Infrastructure (25% of time) Designing multi-stage inference pipelines that handle both real-time enforcement (inline prompt, response, and tool-call blocking) and high-throughput batch workloads (offline scoring, back-testing, corpus mining) while processing billions of tokens daily across Global 2000 customer agent fleets. Optimizing live deployments through shared GPU pools, KV-cache-aware routing, continuous batching, FP8/INT8 quantization, and speculative decoding to minimize inference cost while holding sub-second P99 SLOs. Building serving-layer infrastructure that lets SAGE block agent prompts, responses, and tool calls in real time without becoming a latency bottleneck. This includes model gateway design, request routing, and graceful degradation. Owning canary, shadow, and A/B traffic patterns so new model variants are validated against live customer traffic before they take enforcement decisions. ➢ Building Synthetic Data Pipelines and Online + Offline Evaluation Frameworks (20% of time) Designing automated data curation pipelines that mine live customer environments (with privacy and tenancy guarantees) for high-value per tenant training examples, such as long-tail violations, near-miss policy edges, or novel agent behaviors, and routing them back into the training loop for each customer. Building automated policy back-testing by replaying historical agent traffic against new model and policy versions to catch regressions and recommend policy improvements before customer-visible deployment. Building online evaluation systems for live model decisions, including shadow scoring, drift detection, calibration monitoring, and policy-coverage gap analysis, ensuring quality regressions surface in minutes rather than weeks. Generating synthetic data using frontier teachers (adversarial prompts, policy-edge cases, multi-turn interactions) with evaluation that confirms synthetic data improves downstream quality, not just dataset size. ➢ Insights Mining, Failure Diagnosis, and Adaptive Model Improvement (15% of time) Building memory and context

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Rubrik? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect