Skip to main content
Back to jobs

Senior AI DevOps / LLMOps

External
TechBiz Global GmbH logoTechbiz Global · Baden-baden
Full-timeOn-site3d ago
AWSAzureKubernetesTerraformCI/CDGitHub Actions
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

At TechBiz Global, we are providing recruitment service to our TOP clients from our portfolio. We are currently seeking an Senior AI DevOps / LLMOps specialist to join one of our clients ' teams. If you're looking for an exciting opportunity to grow in a innovative environment, this could be the perfect fit for you. Key Responsibilities Automation of Build-to-Production - Design and implement robust CI/CD pipelines tailored for AI, covering model weights, dataset versioning, and application code. - Develop specialized workflows for PromptOps, ensuring that system prompts are version-controlled, tested for regressions, and deployed with the same rigor as traditional code. -Automate the deployment of Agentic workflows, managing the complexities of stateful AI interactions and multi-agent handoffs. 2. AI Infrastructure as Code (IaC) - Provision and manage high-performance compute environments (GPU clusters, TPU pods) using Terraform, Pulumi, or Ansible. - Define and enforce Policy-as-Code for AI endpoints to ensure compliance with security, cost-usage limits, and data residency requirements. - Maintain a consistent environment across Hybrid Infrastructure, ensuring seamless parity between On-Premises development and Cloud production. 3. Safe Experimentation & Controlled Releases - Architect Progressive Delivery strategies for AI, including Canary releases, Blue-Green deployments, and Shadowing (where new models run in parallel with production to compare outputs). - Build "Evaluation-in-the-Loop" gates within the pipeline to automatically test for bias, hallucination, and performance degradation before a release. - Implement A/B testing frameworks specifically designed for LLM outputs and agentic behavior. 4. Monitoring & Observability - Establish deep observability into Inference Endpoints, tracking metrics like tokens-per- second, latency, and drift in model accuracy. -Integrate feedback loops that capture production "edge cases" to feed back into the training and fine-tuning pipelines. Must-Have Technical Skills: -Orchestration: Advanced Kubernetes (K8s) skills, specifically with KubeFlow, Ray, or NVIDIA Triton. -CI/CD & IaC: Expertise in GitHub Actions/GitLab CI, and Terraform or Pulumi. - AI Tooling: Experience with Weights & Biases, MLflow, LangSmith, or Arize Phoenix. -Hardware: Understanding of GPU virtualization, CUDA drivers, and on-premises hardware management. -Security: Familiarity with Open Policy Agent (OPA) and secret management (Vault). Experience: - 10+ years in DevOps, SRE, or Cloud Engineering. - 2+ years of hands-on experience in MLOps or LLMOps, specifically moving LLMs from notebook to production. -Proven experience managing Hybrid Cloud environments (e.g., AWS/Azure + Private Data Center). Find more English Speaking Jobs in Germany on Arbeitnow


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at TechBiz Global GmbH? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect
Senior AI DevOps / LLMOps at Techbiz Global