Infrastructure Engineer - Virualization
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are building and operating large-scale infrastructure platforms to support high-performance AI workloads across multiple data centers. Our environment includes GPU-intensive systems, high-throughput networking, and rapidly scaling compute clusters. We are looking for a Virtualization Operations Engineer to focus on the day-to-day operation, stability, and performance of our virtualization platforms. This role is responsible for ensuring that our hypervisor environments are reliable, performant, and scalable as we continue to grow. This is a hands-on operations role working across hypervisors, virtual machines, and underlying infrastructure systems.
Responsibilities
- Operate and maintain large-scale virtualization environments (Proxmox and/or KVM-based systems)
- Manage the full lifecycle of virtual machines: provisioning, configuration, migration, decommissioning
- Monitor and respond to platform health issues, including host failures, VM performance degradation, resource contention (CPU, memory, disk, network)
- Troubleshoot and resolve issues across hypervisors, guest operating systems, storage and networking layers
- Execute infrastructure changes safely, including cluster expansions, host maintenance and upgrades, configuration updates
- Work with automation tools to standardize deployments, reduce manual intervention, improve operational consistency
- Collaborate with DevOps (automation and platform tooling), Network Engineering (connectivity and performance), Storage Engineering (I/O performance and reliability)
- Participate in incident response and root cause analysis
- Contribute to runbooks, documentation, and operational best practices
Requirements
- Required Qualifications
- 4-7+ years of experience in infrastructure, systems, or platform operations
- Hands-on experience operating Linux-based virtualization platforms , such as KVM/QEMU, Proxmox, VMware (with strong Linux fundamentals)
- Strong Linux systems knowledge, including process management, networking, disk and filesystem management
- Experience troubleshooting CPU and memory contention, disk I/O bottlenecks, network performance issues
- Familiarity with virtualization concepts: VM lifecycle, resource allocation, live migration
- Experience with infrastructure automation tools (e.g., Ansible or similar)
- Ability to work effectively during incidents and production issues
- Experience operating infrastructure at scale (100+ hosts)
- Familiarity with GPU-based systems or high-performance workloads, NUMA awareness and performance tuning
- Exposure to high-throughput networking (bonding, VLANs, SR-IOV), distributed or high-performance storage systems
- Experience working alongside Kubernetes or container platforms
- Experience in cloud or CSP environments
Benefits
Additional Information
About TensorWave Our mission is simple: deliver seamless, secure, reliable, and resilient AI compute at scale. We've built a versatile cloud platform that eliminates infrastructure barriers, empowering builders to focus on innovation instead of fighting their stack. Because breakthrough AI should move at the speed of ideas, not infrastructure.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at tensorwave? Share your experience