Skip to main content
Back to jobs

AI Benchmark Engineer - Native Language Specialist - German - Remote

External
lilt-production logoLilt-production · (remote), Germany
ContractRemote4mo ago
Prompt EngineeringPythonSAFeShell ScriptingTypography
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

We are building a rigorous, verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects, non-English data processing, and complex locale/encoding edge cases in terminal workflows. We are seeking experienced native-speaking software engineers to design, build, and validate these benchmarks. You will create high-signal, high-quality tasks that genuinely test a model's ability to handle multilingual environments without relying on English translation crutches. Note this is a remote, freelance opportunity What You'll Deliver Task Engineering: Evaluating Coding Agents. Asset Creation: Build realistic task environments using datasets and files in your native language. Crucially, these assets must remain in the target language to genuinely measure multilingual handling. Prompting & Translation: finding failure points where AI does not work, in your native language Implementation & Verification: Support the development of robust solutions (reference implementations) and write highly reliable, deterministic verifier scripts (using rubric-based judging only when strictly necessary). Calibration & Execution: Analyze execution logs and calibrate task difficulty (Easy to Very Hard) using standard Terminal-Bench run configurations against various model tiers (Haiku, Sonnet, Opus). Quality Assurance: Participate in a rigorous, 4-layer human quality control process (creation, human review, calibration review, and audit) alongside automated LLM-based checks to ensure fairness, grammatical accuracy, and benchmark integrity.

Requirements

  • Experience: 1+ years of industry experience in software or prompt engineering.
  • Background: Proven track record at leading technology companies and/or graduation from top-tier engineering universities.
  • Language: Native or near-native fluency, with a deep understanding of its grammar, register, and phrasing rules. High English proficiency.
  • Technical Stack: Strong proficiency in Python, standard shell scripting, and data processing.
  • Workflow: Extensive experience with Terminal/CLI-based development workflows and a working familiarity with coding agents.
  • Domain Expertise: Deep technical understanding of multilingual text processing pitfalls, including:
  • Encoding/decoding robustness and Unicode normalization.
  • Locale-dependent conventions (collation, casing, non-Gregorian dates).
  • Text I/O, toolchain interoperability, and safe string operations.
  • (For specific languages) Bidirectional/RTL handling, font fallbacks, and rendering/typography in UI or artifacts.
  • Why Collaborate with Lilt?
  • Your schedule, your rules. As an independent contractor, work when you want, as much or as little as you want. No fixed hours, no check-ins, no micromanaging.
  • Get paid quickly and fairly. We respect your time and your expertise. Competitive rates, prompt payments, no chasing invoices.
  • Work on projects that actually matter . Contribute to cutting-edge AI and language technology that is shaping how humans and machines communicate.
  • Be part of something bigger. Join a global community of linguists, subject matter experts, and language professionals who are advancing human knowledge together.
  • Grow without limits. As a Lilt contractor you get access to diverse, innovative projects that expand your portfolio and sharpen your skills across industries and domains.
  • How to join our expert community
  • 1 - Submit your application including an updated copy of your CV in English
  • 2 - Next, complete a GenAI assessment to evaluate your skills
  • 3 - Finalize onboarding and profile set-up in our system, and become eligible for Applied AI projects.
  • AI is changing how the world communicates - and LILT is leading that transformation.
  • Earn money. Have fun. Advance human knowledge. Work on diverse projects from anywhere, any time you want. Get paid quickly and fairly, and build your professional network in a supportive community-all through a streamlined application process tailored to your expertise.
  • Information collected and processed as part of your appl

Benefits

Remote work options

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at lilt-production? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect