Applied AI Research Engineer

External

Code Metal · Boston Hub

Full-timeRemoteToday

Hugging FaceMachine LearningMovePythonPyTorchReinforcement Learning

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We're building next-generation AI systems that help military planners explore, compare, and evaluate operational courses of action. Our work combines frontier language models, simulation, planning, and verification into human-in-the-loop decision-support systems for defense applications. As an Applied AI Research Engineer, you'll focus on human machine teaming and agentic AI to build systems that allow warfighters, planners, analysts, and decision-makers to explore operational choices with speed, confidence, and control. This role focuses on designing and building agentic AI systems - not chatbots. You'll develop multi-agent workflows, fine-tune and evaluate models, build retrieval pipelines, experiment with post-training techniques, and integrate AI with simulation and planning software. You'll work closely with AI researchers, software engineers, and defense experts to turn research ideas into production-ready capabilities. The goal is to make complex planning, wargaming, adjudication, and analysis workflows faster, more explainable, and more trustworthy. Research Areas of Interest An incomplete list of ongoing and near-term directions: Human-machine teaming for AI-assisted course-of-action development, comparison, critique, refinement, and operational decision support Agentic planning systems that integrate language models with simulation, doctrine retrieval, external tools, structured outputs, and deterministic verification Adapting and optimizing foundation models through fine-tuning, post-training, distillation, reinforcement learning, and rigorous evaluation for planning and decision-support tasks Multi-agent AI systems for Red/Blue planning, control-cell support, adjudication, branch-and-sequel analysis, and collaborative planning workflows Building reliable AI systems using self-correction, structured reasoning, constraint-aware generation, verification, and robust tool use Learning from human expertise through planner feedback, preferences, approvals, synthetic data generation, and human-in-the-loop improvement Trustworthy AI for high-consequence applications, with an emphasis on explainability, provenance, traceability, auditability, uncertainty estimation, and model behavior analysis

Responsibilities

Design and build agentic AI systems for planning, decision support, and human-machine teaming
Develop AI pipelines that integrate foundation models, retrieval, simulation, external tools, and deterministic software
Design, run, and analyze experiments to evaluate model and agent performance, reliability, traceability, latency, cost, and user trust
Fine-tune, distill, and evaluate foundation models for domain-specific planning, reasoning, and decision-support tasks
Build datasets, retrieval pipelines, automated benchmarks, and experiment infrastructure to support continuous model improvement and reproducible research
Partner with software engineers to transition research prototypes into scalable AI services
Collaborate with domain experts to translate operational workflows into AI-enabled capabilities while ensuring AI outputs remain explainable, reviewable, and under human control
Why Code Metal?
Mission with impact: Build AI systems that help users reason through high-consequence operational decisions.
AI beyond demos: Work on systems where models are paired with software, verification, simulation, guardrails, and human oversight.
Greenfield research: Explore ambitious ideas in GenAI, RL, agentic workflows, evaluation, and human-machine teaming.
Small-team velocity: Move quickly from research question to prototype to user-facing capability.
Real users: See your work tested by planners, analysts, engineers, and operational stakeholders.
Must-Have Credentials
Bachelor's or Master's degree in Computer Science, Machine Learning, Engineering, Mathematics, Physics, or a related technical field, or equivalent practical experience.
3+ years building AI, machine learning, or applied research systems.
Strong Python engineering skills.
Experience with PyTorch and modern LLM tooling (Transformers, vLLM, Hugging Face, etc.).
Experience building or deploying agentic AI systems, tool-calling workflows, or multi-step reasoning pipelines.
Experience fine-tuning, evaluating, or serving language models.
Experience with retrieval-augmented generation, embeddings, vector search, or knowledge retrieval systems.
Strong understanding of experiment design, benchmarking, and model evaluat

Additional Information

About Code Metal Code Metal is redefining code translation for mission-critical industries, helping defense partners move more quickly and reliably from algorithm to silicon. Our platform accelerates deployment of DSP, RF, communications, and embedded signal processing algorithms onto heterogeneous compute targets, including GPUs, FPGAs, ASICs, and edge SoCs. We also support automotive, aerospace, and semiconductor partners deploying complex algorithms onto constrained hardware with speed and rigor.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Code Metal? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect