DevOps Engineer - AWS
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are hiring an AWS Cloud Engineer to design, provision, optimize, and support the AWS infrastructure powering our AMD GPU AI/HPC platform. This is a hands-on execution role - you'll work closely with Rust backend engineers, TypeScript developers, SREs, and platform teams to keep cloud infrastructure reliable, cost-efficient, and scalable. The goal is simple: reduce cloud bottlenecks and give our engineering teams a solid foundation to build on.
Responsibilities
- Own the full lifecycle of AWS infrastructure across dev, staging, production, and customer-facing environments - provisioning, scaling, monitoring, security, cost optimization, and decommissioning
- Build and maintain Infrastructure-as-Code (Terraform, Pulumi, AWS CDK, CloudFormation)
- Implement cloud patterns for high availability, auto-scaling, secure service communication, and customer environment provisioning
- Build and maintain CI/CD workflows for cloud infrastructure and hosted services
- Improve observability through metrics, logging, alerting, dashboards, and runbooks
- Troubleshoot AWS networking, compute, storage, IAM, and deployment issues
- Participate in incident response, post-incident reviews, and root cause analysis
- Document architecture, operational processes, and best practices
Requirements
- Required Qualifications
- 5+ years in cloud infrastructure, DevOps, SRE, or platform operations
- Hands-on AWS experience: VPCs, EC2, S3, IAM, CloudWatch, Route 53, load balancers, security groups, private networking
- Proficiency with IaC tooling (Terraform strongly preferred)
- Strong Linux fundamentals - networking, process management, storage, troubleshooting
- Experience with CI/CD, Git-based workflows, and monitoring/alerting platforms
- Clear communicator who can document infrastructure and collaborate across engineering teams
- Experience with AI/ML, GPU, or HPC workloads
- Kubernetes on AWS (EKS or self-managed)
- Observability platforms: Prometheus, Grafana, Loki, OpenTelemetry, Datadog
- AWS cost optimization: right-sizing, savings plans, lifecycle policies, tagging
- Startup or high-growth infrastructure environment background
Benefits
Additional Information
About TensorWave Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at tensorwave? Share your experience