Staff Applied AI Scientist

External

Culture Amp · Sydney, Australia

Full-timeOn-site5d ago

AsanaDocumentationObservabilityPrompt EngineeringPythonRAG

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We're big believers in the power of IRL, so for most roles we ask Campers to work from their local Culture Amp office an average of 2 days a week to unlock connection, pace and culture together. Join us on our mission to make a better world of work. Culture Amp is the world's leading employee experience platform, revolutionizing how 25 million employees across more than 6,000 companies create a better world of work. Culture Amp empowers companies of all sizes and industries to transform employee engagement, drive performance management, and develop high-performing teams. Powered by people science and the most comprehensive employee dataset in the world, the most innovative companies including Canva, On, Asana, Dolby, McDonalds and Nasdaq depend on Culture Amp every day. Culture Amp is backed by leading venture capital funds and has offices in the US, UK, Germany and Australia. Culture Amp has been recognized as one of the world's top private cloud companies by Forbes and most innovative companies by Fast Company. For more information visit cultureamp.com . We are looking for a Staff Applied AI Scientist to join the team behind AI Coach, Culture Amp's contextually-aware AI coaching system that turns survey insights, performance data and interpersonal dynamics into personalised assistance at scale. Shipping an AI product is only the beginning. The harder problem, and one few teams have solved, is measuring on an ongoing basis whether that product is working well in production, why its quality changes, and how to make it better. In this role you will scale the effective measurement and improvement of our AI products in production, which means establishing online evaluation of live AI features, and then to make this sustainable by enabling the rest of our engineering org to do the same. As part of this team of amazing humans, You will Design and run sampling, LLM-as-a-judge, and labelling systems over de-identified production traces (for example, with Langfuse) to build longitudinal evaluation monitoring and alerting. Build LLM-powered analysis that works out why performance moved and recommends prompt or system changes to improve the product. Own the full feedback loop: prompt engineering, evaluation at scale, data labelling and continuous improvement. Enable others through reusable frameworks, tooling and documentation so product and engineering teams run their own evaluations. Lead from the front, then hand over. Partner closely with Coach, product, data science and people science so measured quality maps to real customer value. Stay current with the latest evaluation, observability and LLMOps research and provider offerings. You have Proven experience analysing the performance of AI or data products in production and turning it into changes that maintained and improved the product. Hands-on LLM evaluation in production: LLM-as-judge, eval datasets, human-in-the-loop labelling, scoring against thresholds. Observability for LLM and agentic systems (traces, sampling, prompt management, production monitoring such as Langfuse or comparable). Longitudinal measurement: metrics and baselines, regression detection, quality tracking over time. Proven commercial experience taking ML or AI systems to production, and strong software engineering fundamentals (we work primarily in Python and TypeScript). AI-native daily practice, comfortable using agentic coding tools (Claude Code, Cursor, Codex or similar) on multi-step tasks, with clear judgment on when to direct an agent versus write code yourself. Strong technical writing and communication, and a track record of building capability into systems and teaching others to own it. Strong signals: built or scaled an eval and observability practice across multiple teams; evolved existing enterprise codebases with AI; production agentic systems (orchestration, RAG); a postgraduate degree in ML, CS, Applied Maths or related; public writing, talks or open-source work in eval, observability or LLMOps. You are Motivated by breaking new ground in an emerging field, with the humility to learn in public and the resilience to be a self-starter. Motivated by enablement. Your biggest wins come from teaching others and building this into our systems, which can mean you do not own what you build forever. The way we build at Culture Amp At Culture Amp, our engineers are increasingly orchestrating agents that write code, rather than just writing it directly themselves. We guide, plan, build, and review loops where AI takes the initiative on routine work, allowing you to steer architecture, trade-offs, and quality. We're investing in a shared "harness" of tooling and standards so agents can do real product work safely, and we all embrace these capabilities as a core part of how we ship. Please note: candidates must be legally authorised to work in the Australia for the duration of employment, the role is based out of our Melbourne or Sydney hubs. Perks & Benefits

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Culture Amp? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect