AI Solution Architect
ExternalFull-timeRemote3w ago
AnsibleCI/CDDockerDocumentationHelmHugging Face
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Architect & Deploy: Design end-to-end GPU cluster architectures (on-premises and cloud) using Ansible, Terraform, Kubernetes, and Slurm.
- Customer Engagement: Lead technical deep-dives, conduct workshops, and present solutions to stakeholders at all levels.
- Automation & IaC: Build and maintain Infrastructure as Code modules to automate provisioning, scaling, and monitoring of GPU resources.
- Documentation & Enablement: Produce whitepapers, runbooks, and training materials; host webinars and training sessions.
- Feedback Loop: Partner with Gcore's engineering and product teams to relay customer insights and drive product enhancements.
Requirements
- Experience: 3+ years in Cloud or GPU AI Infrastructure DevOps.
- Infrastructure Skills: Proven track record deploying GPU clusters at scale, including multi-node, multi-GPU setups.
- Automation Expertise: Hands-on with Ansible or similar configuration management tools; Terraform (IaC).
- Orchestration & Scheduling: Strong familiarity with Kubernetes (K8s) and Slurm.
- Programming: Proficient in Python / Go.
- ML Proficiency: Solid understanding of ML ecosystems-models, tooling, and production deployment patterns.
- Communication: Excellent verbal and written skills; ability to translate complex technical concepts for diverse audiences.
- Experience deploying high-availability inference infrastructure for production AI workloads.
- ML Ops Pipelines: Implement and optimize distributed training and inference pipelines with MLflow, REST APIs, and popular frameworks (PyTorch, TensorFlow, JAX).
- Demonstrated ability to transition ML pipelines from proof-of-concept to robust, scalable production systems.
- Familiarity with GitOps workflows, Docker, Helm charts, and CI/CD for ML.
- Knowledge of Hugging Face transformers, Scikit-learn, and experiment tracking best practices.
Benefits
At Gcore, we want you to do your best work and enjoy the journey. Our benefits are designed to support your growth, well-being, and life beyond work:Competitive compensationFlexible working hours and hybrid or remote options, depending on your roleWork from anywhere in the world for up to 45 days per yearPrivate medical insurance for you and your family*Extra paid vacation and sick leave days*Support for life's important moments and celebrationsLanguage courses to help you connect and growModern, welcoming offices with snacks, drinks, and entertainment*Team sports and social activities**Benefits may vary depending on your location.Equal Opportunity EmployerWe provide equal opportunity to all applicants without regard to race, color, religion, sex, sexual orientation, age, gender identity, gender expression, national origin, disability, or any other legally protected characteristics.Vision insurancePaid time offRemote work optionsFlexible schedule
Additional Information
As an AI Solution Architect at Gcore, you will serve as a trusted advisor to our AI-focused customers. You'll collaborate closely with clients to design and deploy large-scale GPU clusters, containerized training pipelines, and production inference systems. Your expertise in automation, infrastructure as code, and orchestration will ensure seamless, repeatable deployments across hundreds to thousands of GPUs
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Gcore? Share your experience