AI Benchmark Engineer - Native Language Specialist - German - Remote

External

Lilt-production · (remote), Germany

ContractRemote4mo ago

Prompt EngineeringPythonSAFeShell ScriptingTypography

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We are building a rigorous, verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects, non-English data processing, and complex locale/encoding edge cases in terminal workflows. We are seeking experienced native-speaking software engineers to design, build, and validate these benchmarks. You will create high-signal, high-quality tasks that genuinely test a model's ability to handle multilingual environments without relying on English translation crutches. Note this is a remote, freelance opportunity What You'll Deliver Task Engineering: Evaluating Coding Agents. Asset Creation: Build realistic task environments using datasets and files in your native language. Crucially, these assets must remain in the target language to genuinely measure multilingual handling. Prompting & Translation: finding failure points where AI does not work, in your native language Implementation & Verification: Support the development of robust solutions (reference implementations) and write highly reliable, deterministic verifier scripts (using rubric-based judging only when strictly necessary). Calibration & Execution: Analyze execution logs and calibrate task difficulty (Easy to Very Hard) using standard Terminal-Bench run configurations against various model tiers (Haiku, Sonnet, Opus). Quality Assurance: Participate in a rigorous, 4-layer human quality control process (creation, human review, calibration review, and audit) alongside automated LLM-based checks to ensure fairness, grammatical accuracy, and benchmark integrity.

Requirements

Experience: 1+ years of industry experience in software or prompt engineering.
Background: Proven track record at leading technology companies and/or graduation from top-tier engineering universities.
Language: Native or near-native fluency, with a deep understanding of its grammar, register, and phrasing rules. High English proficiency.
Technical Stack: Strong proficiency in Python, standard shell scripting, and data processing.
Workflow: Extensive experience with Terminal/CLI-based development workflows and a working familiarity with coding agents.
Domain Expertise: Deep technical understanding of multilingual text processing pitfalls, including:
Encoding/decoding robustness and Unicode normalization.
Locale-dependent conventions (collation, casing, non-Gregorian dates).
Text I/O, toolchain interoperability, and safe string operations.
(For specific languages) Bidirectional/RTL handling, font fallbacks, and rendering/typography in UI or artifacts.
Why Collaborate with Lilt?
Your schedule, your rules. As an independent contractor, work when you want, as much or as little as you want. No fixed hours, no check-ins, no micromanaging.
Get paid quickly and fairly. We respect your time and your expertise. Competitive rates, prompt payments, no chasing invoices.
Work on projects that actually matter . Contribute to cutting-edge AI and language technology that is shaping how humans and machines communicate.
Be part of something bigger. Join a global community of linguists, subject matter experts, and language professionals who are advancing human knowledge together.
Grow without limits. As a Lilt contractor you get access to diverse, innovative projects that expand your portfolio and sharpen your skills across industries and domains.
How to join our expert community
1 - Submit your application including an updated copy of your CV in English
2 - Next, complete a GenAI assessment to evaluate your skills
3 - Finalize onboarding and profile set-up in our system, and become eligible for Applied AI projects.
AI is changing how the world communicates - and LILT is leading that transformation.
Earn money. Have fun. Advance human knowledge. Work on diverse projects from anywhere, any time you want. Get paid quickly and fairly, and build your professional network in a supportive community-all through a streamlined application process tailored to your expertise.
Information collected and processed as part of your appl

Benefits

Remote work options

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at lilt-production? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect