Senior Machine Learning Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are seeking a highly skilled and innovative Machine Learning Engineer with 4-6 years of experience, specializing in Generative AI and Agentic AI, to join our dynamic team. This role is critical for designing, developing, deploying, and maintaining advanced AI solutions that leverage large language models (LLMs) to create intelligent, autonomous agents capable of performing complex tasks and driving significant business value within the telecom sector. The ideal candidate will possess extensive hands-on experience across the entire Generative AI lifecycle, with a strong focus on MLOps, performance monitoring, and ensuring the robustness and scalability of AI systems in production. Exposure to Google Cloud Platform (GCP) and a background in the telecom industry are highly advantageous. Key Responsibilities: MLOps & Production Systems: Performance Monitoring and End-to-End Tracing: Implement and manage comprehensive monitoring solutions for AI models and agent systems in production, tracking key performance indicators (KPIs), latency, throughput, and resource utilization. Drift Detection Techniques, Alerting & Observability: Develop and deploy robust drift detection mechanisms (data drift, concept drift, model drift) and establish proactive alerting systems to ensure model integrity and performance. Application Performance Management (APM): Integrate AI systems with APM tools to gain deep insights into application behavior, identify bottlenecks, and optimize overall system performance. Systems Tracing: Implement distributed tracing across complex AI workflows and microservices to provide end-to-end visibility and facilitate efficient debugging and performance optimization. Full-Stack Application Architecture: Collaborate with software engineers to integrate AI components seamlessly into existing and new full-stack applications, ensuring scalability, reliability, and maintainability. Deploy Generative and Agentic AI models and systems at scale, optimizing for latency, throughput, robustness, and cost-efficiency on cloud platforms (GCP preferred). Implement and manage CI/CD pipelines for AI solutions, ensuring seamless integration, testing, and deployment. Develop strategies for continuous improvement, model retraining, and A/B testing in production environments. Generative AI & Agentic AI Development: Design, develop, and optimize enterprise-grade Generative AI solutions, leveraging LLMs for various applications within the telecom domain. Implement sophisticated prompt engineering strategies (e.g., Chain of Thought, Few-Shot, RAG) to maximize LLM reasoning and creativity. Architect, develop, and deploy robust and scalable Agentic AI systems capable of autonomous decision-making, task execution, and complex problem-solving. Utilize and contribute to agentic frameworks such as Google ADK (Agent Development Kit) and LangChain to build sophisticated multi-agent systems and orches