Research Engineer - Evals
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Run fast experiments and communicate clearly. You design experiments that test meaningful hypotheses, run them quickly, and make decisions based on results. When you have findings, anyone on the team can understand what they mean - no decoder ring required.
Requirements
- Fast and clear. You'd rather run three rough experiments this week than one polished one next month. When you have results, anyone on the team can unde
Benefits
Additional Information
Research Engineer - Evals You'll build the evaluation systems that tell us whether Firecrawl actually works. That sounds simple. It isn't. Our core promise - convert any URL into clean, structured, LLM-ready data reliably - is hard to measure rigorously across millions of different websites, formats, and edge cases. As we layer in models and agent workflows, the question "did that work?" gets harder, not easier. This isn't an eval role where you inherit a framework and run benchmarks. You'll design the metrics, build the pipelines, generate the datasets, and own the feedback loop from output quality back to model and product decisions. If you care about what "good" actually means and have the engineering depth to measure it, this is the role. Salary Range: $160,000 to $240,000/year (Range shown is for U.S.-based employees in San Francisco, CA. Compensation outside the U.S. is adjusted fairly based on your country's cost of living.) Equity Range: Up to 0.10% Location: San Francisco, CA or Remote (Americas, UTC-3 to UTC-10) Job Type: Full-Time Experience: 3+ years in ML engineering, applied AI, or data quality - with production systems Visa: US Citizenship/Visa required for SF; N/A for Remote About Firecrawl Firecrawl is the easiest way to extract data from the web. Developers use us to reliably convert URLs into LLM-ready markdown or structured data with a single API call. In just a year, we've hit 8 figures in ARR and 120k+ GitHub stars by building the fastest way for developers to get LLM-ready data. We're a small, fast-moving, technical team building essential infrastructure superintelligence will use to gather data on the web. We ship fast and deep.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at firecrawl? Share your experience