Principal DevOps Engineer

External

Radiant Industries · El Segundo, CA

Full-timeOn-site2mo ago

AnsibleAWSComplianceDockerKubernetesLinux

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

Radiant is seeking a driven Principal DevOps Engineer to own on-site High-Performance Compute (HPC) infrastructure, deployment, and automation projects in tight collaboration with hard-science users and cloud infrastructure engineers. You will work closely with the software team to design scalable, secure, and resilient DevOps practices, tools, and systems across the entire organization. As a technical lead, you will define team scope, shape individual responsibilities, and serve as the primary liaison between engineering and software teams, acting as the subject-matter expert on workflows, tools, and optimizations. The ideal candidate is patient, organized, and comfortable managing high volumes of cross-disciplinary requests, capable of diving deep into complex legacy stacks and synthesizing findings. The infrastructure you manage, the pipelines you build, and the developer productivity culture you establish will help design, run, and mass-produce the first high-temperature gas-cooled portable microreactor ever commercialized. Responsibilities & Duties Lead HPC initiatives as driven by the software org, establishing responsibilities, project scope, and technical mentorship. Serve as the go-between for engineering teams and software, fielding HPC questions, simulation software issues, and infrastructure needs from nuclear, thermal, materials, mechanical, and electrical engineers, translating them into actionable work. Own workflows, tooling, and performance for scientific computing, including Ansys, STAR-CCM+, and Abaqus, covering licensing, environment setup, job orchestration, and results infrastructure. Triage inbound infrastructure requests, HPC/MPI/Linux debugging, job failure analysis, shell and systems mentorship, while prioritizing effectively and communicating clearly across stakeholders. Partner with DevOps engineers to architect and maintain infrastructure across AWS and on-premises Linux environments, ensuring high availability, security, and performance for mission-critical systems. Dive deep into HPC software, workload scheduling, data movement, storage hierarchies, and compute environments to build robust, high-throughput Linux systems. Modernize legacy scientific computing systems and tooling, migrating to current stacks to improve maintainability, performance, and developer experience. Architect tools supporting build systems, testing frameworks, deployment automation, and developer environments. Design and maintain networking infrastructure for distributed simulation systems, optimizing data transfer between HPC clusters. Required Qualifications & Skills Bachelor's degree in Computer Science, Engineering, or a related technical field. 8+ years of professional experience in DevOps, Site Reliability, Infrastructure, or Platform Engineering. Expert-level proficiency in one or more languages: Python, Golang, Rust, C#, or C/C++. Strong code review skills, including the ability to read stack traces and chase down dense rabbit holes in high-compliance, legacy scientific software environments. Working experience with Kubernetes and Docker for orchestration and deployment. Deep Linux and sysadmin fluency, file systems, process management, and networking, with a hard-science approach to problem-solving. Exceptional communication skills, organized, and effective when fielding a high volume of questions and requests from engineering stakeholders. Ability to dive deep into complex, nested legacy stacks and then zoom out to synthesize findings clearly for a broader technical audience. Desired Qualifications & Skills Hands-on physical infrastructure experience: server hardware, rack configuration, data center ops, or high-throughput networking (InfiniBand, Slingshot, AWS EFA). Scientific computing experience with direct exposure to HPC environments, simulation workloads, or academic/research computing (SLURM, PBS, MPI, Lustre, EFS, AWS ParallelCluster, PCS, etc.). Cloud DevOps experience, production-grade work with Kubernetes, Ansible, Terraform, or AWS in a cloud-native or hybrid environment. Experience working cross-functionall

Benefits

Remote work options

Additional Information

About Radiant Radiant is an El Segundo, CA-based startup building the world's first mass-produced, portable nuclear microreactors. The company's first reactor, Kaleidos, is a 1-megawatt, fail-safe microreactor that can be transported anywhere power is needed and run for up to 5 years without refueling. Portable nuclear power with rapid-deploy capability can replace similar-sized diesel generators and provide critical asset support for hospitals, data centers, remote sites, and military bases. Radiant's unique, practical approach to nuclear development leverages modern software engineering to rapidly deliver safe, factory-built microreactors that use existing, well-qualified materials. Founded in 2020, Radiant is on track to test its first reactor at the Idaho National Laboratory this summer, with initial customer deliveries beginning in 2028.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Radiant Industries? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect