AI Cloud Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Requirements
- Technical Skills
- Solid proficiency in Infrastructure as Code for infrastructure automation.
- Strong hands-on experience with cloud services in GCP (Compute Engine, Kubernetes Engine, BigQuery, Cloud Storage) and AWS (EC2, S3, Lambda, ECS/EKS).
- Knowledge of containerization (Docker) and orchestration (Kubernetes).
- Familiarity with CI/CD tools such as Jenkins, GitHub Actions, or GitLab CI.
- AI & Data Infrastructure
- Understanding of AI model lifecycle, model serving, and data-intensive infrastructure.
- Exposure to data pipeline technologies (Pub/Sub, Kafka) and familiarity with analytics services.
- Portfolio
- Evidence of successful cloud infrastructure deployments, ideally showcasing Terraform expertise and cloud automation.
- Contributions to projects involving AI integration or complex cloud architectures are highly beneficial.
- Certification/License:
- AWS SysOps Administrator - Associate
- AWS Solutions Architect - Associate
- GCP Associate Cloud Engineer
- GCP Professional Cloud Architect or GCP Professional DevOps Engineer
- Terraform Certified Associate
- Education
- Bachelor's degree in Computer Science, Engineering, Information Systems, or related fields (or equivalent experience).
- Bachelor's degree in Information Technology, Computer Science, Engineering, or a related field .
- P
Additional Information
At Globe, our goal is to create a wonderful world for our people, business, and nation. By uniting people of passion who believe they can make a difference, we are confident that we can achieve this goal. Job Description AI Cloud Engineer experienced in cloud platforms, primarily Google Cloud Platform (GCP) and Amazon Web Services (AWS), with a solid foundation in Terraform and infrastructure automation. You will play an essential role in deploying, managing, and optimizing cloud environments tailored for AI-driven solutions. In this position, you'll collaborate closely with backend engineers, data scientists, DevOps specialists, and stakeholders to create reliable, scalable, and efficient AI infrastructure. DUTIES AND RESPONSIBILITIES: Cloud Infrastructure & Automation Design, deploy, and maintain scalable cloud infrastructure for AI workloads on the Cloud and On-Prem Infrastructure. Leverage Infrastructure as Code (e.g. Terraform, Helm) to ensure efficient, repeatable, and reliable deployments. Continuously optimize cloud resource usage and costs while ensuring infrastructure scalability and reliability. AI Integration & Model Deployment Work closely with AI teams to build and manage cloud-based environments for AI model training, validation, and deployment. Develop and maintain CI/CD pipelines tailored for AI applications, ensuring smooth and automated deployment of models and services. Ensure high availability and scalability for AI inference endpoints. Performance Monitoring & Optimization Implement robust monitoring and alerting solutions (CloudWatch, Stackdriver, Prometheus, Grafana) for AI workloads and cloud infrastructure. Proactively identify and address performance bottlenecks, optimizing resources to reduce latency and improve system efficiency. Regularly review performance metrics and recommend improvements for operational excellence. Security & Compliance Implement and manage cloud security best practices, including IAM, encryption, vulnerability scanning, and network security. Ensure compliance with industry standards and regulations (e.g., GDPR, HIPAA) specific to AI and data management. Collaborate with security teams to conduct regular assessments and integrate security controls into deployment pipelines. Collaboration & Communication Partner with cross-functional teams including backend engineers, data scientists, and DevOps teams to ensure cohesive AI platform management. Communicate effectively across teams to translate complex infrastructure requirements into actionable plans and solutions. Actively participate in agile methodologies, contributing to sprint planning, retrospectives, and continuous improvement initiatives. KPIs: Uptime percentage and reliability of AI model training and inference services. Cost efficiency improvements achieved through optimized cloud resource management. Minimizing AI infrastructure incidents and maintaining SLA compliance for AI-enabled systems TOP 3-5 DELIVERABLES: Design and implement scalable, secure, and cost-effective cloud infrastructure on AWS and GCP tailored specifically for AI workloads. Develop and maintain Infrastructure as Code (IaC) , ensuring consistent, repeatable, and automated cloud deployments. Create and optimize CI/CD pipelines for efficient AI model deployment , ensuring seamless integration, continuous availability, and scalability. Implement comprehensive monitoring and alerting systems , proactively resolving performance bottlenecks and minimizing system downtime. Ensure robust security and compliance measures for AI cloud environments, aligning with industry standards and regulations (e.g., GDPR, HIPAA).
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at globe? Share your experience