Skip to main content
Back to jobs

Model Behavior Engineer

External
notion logoNotion · New York, NY
$98K–$140K/yrFull-timeRemote3mo ago
LessLLMsMoveNotionObservabilitySQL
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

Notion is the collaborative AI workspace where teams and agents think together . We're building one place where your knowledge, projects, meetings, and AI tools live side by side, so work is faster, clearer, and less fragmented. Millions of individuals, small teams, and large companies run their work on Notion. Notinos (our employees) are customer zero in bringing this future of work to life. We care about craft, building things that last, and the belief that great work is still fundamentally human. Our goal isn't to ship the next feature. Each and every team of Notinos is working to set the standard for how humans work together in the AI era. From building a business's system of record to making and managing AI agents to automating away the busy work, we care deeply about giving our customers more time for their life's work. You'll own the quality bar for Notion AI products. You'll work with product and engineering teams to build systems to define what "good" looks like, measure our progress, and drive changes to deliver reliable and high-quality AI experiences. Your work directly shapes how Notion's AI products behave for millions of users. This isn't a traditional software engineering role. It's an art & science role . You won't spend your days writing code. Instead, you'll focus on understanding and shaping how our AI products behave through context engineering, designing evaluation systems, and analyzing data. This team sits in our AI engineering team, working directly with engineering, product, design, and data. This role is a unique blend of ops, strategy, and product thinking. Day to day, you'll live in production data, ship prompt fixes, run evals and, in effect, shape our quality strategy. As part of that you'll shape Notion's model strategy and work directly with frontier AI labs (OpenAI, Anthropic, Google) to evaluate and launch new models. We're looking for problem-seeking generalists interested in 0 → 1 : curious people with high agency who thrive in ambiguous, fast-moving product areas. We're building a product, but also building a new function. You'll have real ownership from day one and help write the playbook as we scale. What You'll Achieve Context engineering - Design, test, and iterate on system prompts, tool prompts, and context strategies that shape how Notion's AI products behave. Understand the nuances of how models respond to different context structures and use that knowledge to drive quality improvements directly. Understand & debug - Live in production data: transcripts, logs, user feedback. Reproduce issues, identify root causes, and translate symptoms into actionable problem statements. Find signal in noisy data. Build evals & Measurement - Design eval strategies, build datasets, run evaluations. Track quality over time. Identify issues before users do. Own the loop: define quality goals, create evals, test and improve Evaluate and launch new models with leading research labs - Evaluate and launch models from OpenAI, Anthropic, Google, and others. Benchmark across dimensions: quality, latency, cost, edge cases. Help shape Notion's model strategy based on real data. Drive quality priorities - Work embedded with eng and product teams to surface the most important issues. Own the quality narrative: severity, frequency, what to fix and why. Be the voice of quality in the room. Build tooling & systems - Help manage AI observability and eval platforms (e.g., Braintrust). Build the playbooks and tools that enable all teams at Notion to build AI products. Skills You'll Need to Bring Driver mentality - You treat problems as yours. If something's broken, it's your job to fix it, even if you didn't cause it. You have a bias to action. Curiosity -You're excited about exploring the "jagged frontier" of LLM capabilities and how AI products meet reality Analytical instinct - Your first move is to look at data. You can find signal in noise. Comfortable working with data - You can self-serve insights from large datasets, whether through SQL, coding agents, or other tools. Clear communication - You can explain complex issues simply. Experience with LLMs , prompting, or AI products

Requirements

  • Backgrounds in engineering, product, data science, research, consulting
  • You've built something on your own to solve a problem - side project, startup, tool, whatever
  • By clicking "Submit Application", I understand and agree that Notion and its affiliates and subsidiaries will collect and process my information in accordance with No

Benefits

Paid time offEquity / stock options

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at notion? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect