Skip to main content
Back to jobs

#EG AI Tech Lead

External
Ncs3 logoNcs3 · Singapore, Switzerland
Full-timeOn-site5d ago
CachingCI/CDDockerGitKafkaLangChain
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

What will you do? Technical Delivery Own end-to-end delivery of AI workstreams: from requirements through data preparation, modelling, integration, testing, and production handover Develop agentic applications including RAG pipelines, prompt-engineered agents, and agentic workflows using LangChain, LlamaIndex, LangGraph, or plain Python Build on top of GenAI application stacks including LLM orchestration, observability (Langfuse, Braintrust), guardrails, and LLM gateway patterns (LiteLLM, Portkey) Implement multi-agent orchestration layer: event routing, resource locking, inter-agent handoff contracts, prompt caching, and shared state management Implement agentic design patterns including workflow evaluation, LLM-as-Judge, and AI red teaming Adopt Model Context Protocol (MCP) and Agent-to-Agent (A2A) Protocol as foundational extension mechanisms, enabling agents to operate safely across enterprise ecosystems Work with AI Solution Architect and Business Analysts on benchmarking and golden-set compilation for agent evaluation Implement effective RAG architectures: chunking strategies, embedding selection, vector store configuration (Qdrant, Milvus, pgvector), hybrid search, and reranking Build and maintain evaluation harnesses measuring correctness, faithfulness, latency, and safety. Run continuous LLM-as-Judge evaluation on production traces Write clean, maintainable Python; participate in code reviews; contribute to shared AI platform libraries OSS Integration & Data Engineering Implement REST, NETCONF/YANG, SNMP trap ingestion, and gNMI streaming telemetry integrations connecting agents to NMS and EMS systems Build ServiceNow integrations for ticket read/write, triage updates, change record queries, and human-in-the-loop approval workflow triggers for the Execution Agent Build the alarm and telemetry data normalisation pipeline converting multi-vendor OSS data into agent-consumable schemas Implement Kafka or equivalent event streaming consumers delivering real-time alarm data to the Alarm Correlation and Ticket Triaging agents Work with data and integration engineers to normalise multi-vendor, multi-format network telemetry into consistent schemas the agents can reason over Safety, Guardrails & Risk Implement guardrail architectures: hard limits, soft limits with human confirmation flows, blast radius controls, and escalation logic Assist the AI Solution Architect to assess blast radius for each agent - documenting worst-case impacts and designing specific mitigating controls Implement and validate rollback procedures for all agent-initiated network actions, including automated rollback triggers based on post-execution KPI degradation Work with QA engineers to design and test solutions via defined test cases, and review required improvements Ensure all agent architectures comply with IMDA regulatory requirements, AIVerify framework requirements, InfoSec policies, and CMB change governance frameworks Observability & Evaluation Implement the Decision Audit Trail schema for all agents ensuring every decision is fully traceable, explainable, and available for post-incident review Build the Agent Operations Dashboard for the NOC team: real-time agent status, decision volumes, accuracy metrics, and exception alerts Define and implement the evaluation framework: factual, reasoning, tool-call, retrieval and planning accuracy, confidence thresholds, and human feedback capture Implement continuous evaluation of production agent traces using LLM-as-Judge for quality and safety; design human evaluation sample process Build drift detection mechanisms that alert the team when agent output quality degrades before it becomes an operational incident Containerise and deploy agent services using Docker and CI/CD pipelines; manage staging and production deployments with rollback capability The ideal candidate should possess: Minimum 3 years of AI/ML engineering experience, with at least 1 year in production Agentic AI systems (not PoC or prototype-only) Demonstrable experience building AI agents that call external tools, manage state, and operate in production environments with real operational consequences Hands-on experience with LLM APIs (OpenAI, Anthropic Claude) or Open Source LLMs on vLLM (Qwen, Mistral, DeepSeek) in production - including prompt engineering, evaluation, and cost management Experience building RAG pipelines with vector databases (Qdrant, Milvus, Pinecone, Weaviate, pgvector) in production, including hybrid search and reranking Practical experience with at least one agentic AI framework: LangGraph, LangChain, LlamaIndex, CrewAI, AutoGen, or custom tool-calling Strong Python - modular, testable, production-quality code. Proficiency with Git, CI/CD, Docker, and LLMOps practices Full-stack awareness: REST APIs, front-end integration, data pipelines Degree in Computer Science, Electrical Engineering, AI/ML, or related technical field Highly Desirable Experience with Telco, utilities, or other 24×7 critical


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Ncs3? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect