Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Help build and improve automation, tooling, and infrastructure that supports AI workloads
- Support the development of operational systems and platform services
- Assist in defining and maintaining basic SLOs/SLIs and monitoring dashboards
- Participate in incident response, troubleshooting, and post-incident reviews
- Investigate and help resolve performance and reliability issues across systems
- Collaborate with Engineering, Networking, and Infrastructure teams to improve system stability
- Contribute to improving availability, scalability, and operational efficiency
- Learn from senior engineers and grow your expertise in reliability engineering
- What You Bring
- 2-5 years of experience in Site Reliability Engineering, Systems Engineering, or Software Engineering in Data Center Environment
- 2+ years programming skills (e.g., Python, Go, or similar) with interest in automation and tooling
- Working knowledge of Linux systems, networking concepts, and distributed systems
- Experience troubleshooting system or application issues in production environments
- Familiarity with monitoring or observability tools (e.g., logs, metrics, dashboards)
- Strong willingness to learn and improve reliability and operational practices
- Ability to work in fast-paced environments and collaborate across teams
- Preferred Experience
- Exposure to cloud platforms, Kubernetes, or virtualized/bare-metal environments
- Experience in AI, GPU workloads, or high-performance computing (HPC)
- Basic understanding of high-performance networking concepts (e.g., InfiniBand, RDMA)
- Exposure to production monitoring or alerting systems at small or medium scale
- What We Can Offer You
- At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core.
- Highly competitive package (base + equity) with reviews every 12 months .
- Join the fastest-growing tech startup , your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting-edge AI. ✨
- Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support.
- Human-First Flexibility: We treat you as humans first . Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.
- Equal Opportunities Statement
- We strongly encourage applications from people of color, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, careers, and people from lower socio-economic backgrounds.
- If there's anything we can do to accommodate your specific situation, please let us know.
- Salary Range
- $100,000 - $170,000 USD
- For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.
Benefits
Additional Information
About Nscale Nscale is the GPU cloud engineered for AI-purpose-built to deliver high-performance, cost-efficient infrastructure for AI-native startups and global enterprises. We enable organizations to accelerate innovation, reduce the complexity of AI development, and achieve meaningful business outcomes through scalable, sustainable compute. Our culture is defined by ownership, accountability, and rapid innovation. We operate with urgency and transparency, and every team member contributes to building the infrastructure powering the future of AI.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at nscaleoperationsukltd? Share your experience