AI Video Agent Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are building an AI-driven video production system designed as an intelligent multi-agent orchestration layer capable of transforming raw ideas and references into fully structured video content. The system operates as a black-box creative engine for different types of users: professional video creators casual users creators of short narrative concepts Users provide an idea, references, or media fragments , and the system automatically orchestrates multiple AI agents responsible for analysis, scripting, editing, and production of video content . We are looking for an engineer who can design and implement multi-agent pipelines , orchestrate AI tools, and build intelligent workflows that combine video analysis, storytelling logic, and automated editing . This role sits at the intersection of AI systems architecture, creative tooling, and multimodal content generation .
Responsibilities
- AI Agent Architecture
- Design and implement the architecture of a multi-agent video editing system including agents responsible for:
- video analysis
- narrative generation
- editing orchestration
- production and output synthesis
- Define system prompts, behavioral rules, and structured instructions for agents interacting within the pipeline.
- Pipeline Orchestration (n8n)
- Develop and maintain complex orchestration pipelines in n8n , including:
- multi-agent workflows
- tool-calling logic
- dynamic routing between tools and models
- context passing between agents
- Pipelines must be capable of selecting the most appropriate models, tools, and strategies depending on the task.
- Multimodal Data Processing
- Design robust pipelines for handling:
- video materials
- image assets
- user text prompts
- structured metadata
- Ensure proper data transformation and context transfer across the pipeline stages.
- Tool & API Integration
- Integrate both external and internal APIs for multimodal generation and processing, including:
- image generation
- video generation
- speech synthesis
- audio generation
- video processing services
- Rapidly evaluate available APIs and select the best quality tools and models for each task.
- Model Orchestration & Optimization
- Tune and optimize model interactions, primarily based on Gemini models , including:
- prompt engineering
- structured outputs
- tool-calling workflows
- agent collaboration logic
- Optimize pipelines for quality, reliability, and execution efficiency .
- Future Architecture (RAG & Knowledge Systems)
- Design systems that support:
- vector databases
- retrieval-augmented generation (RAG)
- memory and contextual reasoning between agents
- Expected Outcomes:
- The pipeline system should be capable of:
- Taking an idea + references as input
- Analyzing the content
- Generating a coherent narrative structure
- Selecting appropriate visual and audio elements
- Producing a high-quality, structured video output
Requirements
- AI Systems & Agent Architecture
- Strong experience building multi-agent systems , including:
- intent and sub-intent modeling
- agent orchestration
- agent communication and transport layers
- summarization pipelines
- context passing between agents
- Workflow Orchestration
- Hands-on experience with:
- n8n
- Agent tool-calling
- n8n MCP
- Experience designing complex automation pipelines is essential.
- Programming
- Strong practical coding skills with vibe coding mindset :
- Primary languages:
- Python
- JavaScript
- Bonus experience:
- ComfyUI custom nodes
- lightweight APIs (e.g., HuggingFace Spaces or inference endpoints)
- Multimodal Tooling Knowledge
- Ability to quickly navigate API documentation and integrate tools for:
- image generation
- video generation
- speech synthesis
- audio generation
- multimodal analysis
- You should know where and how to obtain the best generation quality for each modality .
- Creative Thinking
- Strong sense of visual rhythm and composition
- Creative intuition and storytelling awareness
- Good taste in video structure and montage
- Ability to evaluate AI-generated output not only by metrics, but also by creative quality and narrative coherence
- Video & Media Processing
- Preferred experience with:
- FFmpeg
- video processing pipelines
- image processing workflows
- The following qualifications are not mandatory, but will significantly strengthen a candidate's profile during the evaluation process:
- Agentic Frameworks
- Deep understanding of frameworks such as LangGraph and LangChain for building complex cyclic state graphs and stateful agent systems.
- Advanced RAG & Memory Management
- Experience designing long-term memory systems for agents, including mechanisms for storing and retrieving successful execution patterns or scenarios to improve future performance through experience-based retrieval.
- Self-Corr
Benefits
Additional Information
We are toogeza , a Ukrainian recruiting company that is focused on hiring talents and building teams for tech startups worldwide. People make a difference in the big game, we may help to find the right ones. Currently, we are looking for AI Video Agent Engineer for Elva . Location: Remote Job Type: Full-Time
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at toogeza? Share your experience