Lead QA Engineer (AI & Agentic Systems)

External

Solstice · New York City

Full-timeOn-siteToday

CI/CDComplianceLLMsPlaywrightPythonREST

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We're hiring our first dedicated QA lead to own quality for the AI that powers Solstice. Our platform generates regulated pharmaceutical marketing content for the brands we work with, so when the output is wrong, say an unsupported claim or a missing safety disclosure, it becomes a real compliance problem and not just a bug to file. What makes this hard is that the system is probabilistic. The same prompt can return different answers, "correct" is often a judgment call, and the exact-match assertions that traditional QA relies on don't apply. We need someone who can measure quality anyway and build the evals that catch regressions before a client does, so the rest of the team can keep moving quickly. This role is about more than the models, though. Just as much of what keeps customers happy is ordinary product reliability: that the app does what it should, and that a frontend tweak or a new feature doesn't quietly break something in production. You'll own that side too, with a solid end-to-end suite in Playwright and the hands-on manual testing that catches what automation misses. This is a senior, hands-on engineering job. Most of your time goes to writing code and building test infrastructure, but you'll also dig into manual testing whenever that's the fastest way to find a problem. Your work will reach across the whole product, from the backend services to the frontend customers use every day. What you'll be working on Build our evaluation systems. Because we can't check an output against a single correct answer, you'll design the evals that score quality instead and decide, with evidence, what is good enough to ship. Make models and prompt changes safe. We swap models and rewrite prompts constantly. Your tooling should flag a drop in quality, a jump in cost, or a latency regression before a customer runs into it. Test the agents for the ways they actually fail. Agents drift off their goal, loop on the same tool call, pick the wrong tool, or get hijacked by a malicious instruction buried in a document we ingest. Those are the cases you'll design for. Protect the compliance-critical paths. The checks that keep an unsupported claim or a missing disclosure out of a finished asset are the ones that matter most, and you'll own how we test them, including verifying claims against approved source material. Own end-to-end testing across the app. Build and maintain a Playwright suite that exercises the real user flows, from login through content creation and review, so a frontend or API change can't quietly break something a customer depends on. Run hands-on manual and exploratory QA. Automation misses things, especially on new features and messy UI states. You'll test releases by hand, dig for the edge cases, and be the last set of eyes before we ship. Get CI/CD quality gates in place. Today nothing runs automatically when someone opens a pull request: no tests, no linting, no type checks. Building that is yours. Use production as a test bed. We already trace and monitor what the system does once it's live. You'll turn those signals into drift detection and into new regression tests whenever something slips past us. Harden the background jobs. A lot of our work runs in long pipelines, so they need to survive retries, timeouts, and worker crashes without dropping or duplicating work. Set the testing bar. As our first QA hire, you'll define what good testing looks like here and help the rest of the team write code that's easy to trust.

Requirements

This is an engineering role first. You should be comfortable building testing and evaluation tools in code, and equally comfortable rolling up your sleeves for hands-on manual testing when that's what the situation needs.
Strong Python, and real experience building test infrastructure and getting it to run automatically in CI/CD.
Strong end-to-end and UI test automation, especially

Additional Information

Why you should join us Solstice is redefining how life sciences organizations commercialize their therapeutics. We are building a commercial engine that allows pharmaceutical marketers to launch campaigns at 100x the speed. Rapid growth: Over the past year, we have been working with some of the world's top life sciences manufacturers including Pfizer and AstraZeneca and are working with over 50+ pharma brands. Frontiers of technology: We're building applications and rapidly iterating technical approaches at the frontiers of AI, experimenting with LLMs and new frameworks. We are incorporating new, advanced frameworks into our solution to better define the possibilities of pharmaceutical marketing. Top-tier investors: We've raised from investors like Transformation Capital, Twelve Below, Virtue and the founders of Datavant, Commure, and Paradigm to supercharge our growth and build an elite team of engineers and operators. Building anything great requires commitment and dedication. We're looking for someone who has the fire to prove others wrong.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Solstice? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect