Infrastructure Operations Operator
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are hiring an Infrastructure Operations Operator to support the day-to-day operation of Nscale's AI infrastructure environments. This role is ideal for an experienced operator who combines strong technical aptitude with excellent operational judgment and stakeholder management skills. You will play a critical role in maintaining service availability, responding to operational incidents, supporting customers, and ensuring the smooth running of our data centre environments. As a senior member of the operations team, you will take ownership of complex operational issues, mentor junior operators, and contribute to the continuous improvement of operational processes and service delivery. You'll be comfortable working in high-pressure situations, making decisions with incomplete information, and collaborating across multiple teams to achieve the best outcomes for customers and the business. Flexibility to work shifts, participate in on-call rotations, and travel to support operations across multiple data centre locations is essential.
Responsibilities
- Operational Support & Service Delivery
- Monitor and maintain the operational health of Nscale's data centre infrastructure and supporting systems.
- Ensure operational activities are performed in accordance with established procedures and service level agreements (SLAs).
- Support day-to-day service delivery activities, maintaining high levels of reliability and customer satisfaction.
- Participate in shift rotations and provide on-call support as required.
- Support operational activities across multiple data centre locations when needed.
- Incident Management & Troubleshooting
- Lead incident response efforts and coordinate resolution activities during service-impacting events.
- Diagnose and resolve hardware, software, networking, and infrastructure-related issues.
- Escalate complex operational and technical issues appropriately and coordinate with specialist engineering teams.
- Conduct root cause analysis and contribute to post-incident reviews and corrective action plans.
- Support service restoration efforts during critical incidents and high-priority operational events.
- Customer Support & Escalation Management
- Handle advanced customer support requests and operational escalations.
- Act as a technical point of contact during incidents and customer-impacting events.
- Ensure timely communication and resolution of customer issues.
- Maintain a customer-first mindset while balancing operational priorities and business needs.
- Team Leadership & Knowledge Sharing
- Mentor and support junior operators, sharing operational expertise and technical knowledge.
- Assist with onboarding and training of new team members.
- Promote operational excellence and continuous learning across the team.
- Contribute to building a strong operational culture focused on ownership and accountability.
- Infrastructure Operations & Vendor Coordination
- Coordinate with vendors and suppliers for hardware replacements, maintenance activities, and operational support.
- Support asset management processes and maintain accurate infrastructure inventory records.
- Assist with hardware deployment, installation, and lifecycle management activities.
- Ensure operational readiness of infrastructure and supporting systems.
- Process Improvement & Documentation
- Contribute to the development and optimization of operational processes and workflows.
- Create and maintain operational documentation, runbooks, and standard operating procedures.
- Identify opportunities for automation and efficiency improvements.
- Support operational readiness initiatives, training programs, and business improvement projects.
- About You
- Required Experience
- 2+ years of experience in data centre operations, infrastructure operations, technical operations, or a similar operational environment.
- Strong understanding of data centre operations, server hardware, and networking fundamentals.
Benefits
Additional Information
About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. At Nscale, our Operations team plays a critical role in maintaining service availability, driving operational excellence, and delivering exceptional customer experiences across our AI infrastructure platform. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at nscaleoperationsukltd? Share your experience