HPC Engineer

External

Sandisk · Bengaluru, India

Full-timeOn-site1w ago

AnsibleAWSAzureBashCapacity PlanningDocumentation

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Architect, deploy, and manage large-scale distributed HPC environments across global locations, supporting ASIC and GPU compute clusters
Design and implement infrastructure automation using Ansible, Shell, and Python for system lifecycle management
Administer and optimize workload schedulers ( LSF, Slurm, NC ) including queue configuration, fair-share policies, and job prioritization
Perform deep troubleshooting and root cause analysis across compute, storage, networking, and scheduler layers
Collaborate with engineering teams to improve EDA workload performance and efficiency in global HPC environments
Develop and deploy self-service automation solutions to reduce manual effort and improve system reliability
Manage and support EDA ecosystem including tool deployment (Cadence, Synopsys), licensing, and workflow optimization
Implement monitoring & observability frameworks using tools like Splunk, Grafana for proactive issue detection
Drive capacity planning, performance tuning, and resource optimization for HPC workloads
Create and maintain technical documentation, runbooks, and operational standards
Provide technical leadership and mentoring , influencing HPC architecture and long-term strategy
Techncal Skills
HPC & Scheduling: LSF, Slurm, Network Computer (NC), Grid/Batch scheduling
Operating Systems: RedHat Enterprise Linux (RHEL), CentOS
Automation & Scripting: Ansible, Shell/Bash, Python
EDA Tools: Cadence, Synopsys, EDA workflows & design environments
Monitoring & Observability: Splunk, Grafana, Prometheus
Storage & Filesystems: NFS, AutoFS, distributed storage systems
Authentication & Access: UNIX/Linux integrated with Active Directory
Infrastructure: On-premises & Hybrid HPC environments
Remote Access & VDI: Exceed TurboX, VNC, nomachine
Preferred Skills
Extensive experience with job schedulers such as LSF, Slurm, or equivalent platforms
Experience supporting EDA / semiconductor design environments
Exposure to GPU computing and accelerator-based workloads
Knowledge of EDA licensing systems and optimization
Experience with Infrastructure as Code (IaC) and platform standardization
Familiarity with cloud or hybrid HPC architectures (AWS/Azure HPC)
Bachelor's degree in Computer Science, Engineering, or equivalent experience
8+ years of experience in Linux system administration (RHEL/CentOS)
Strong expertise in HPC cluster management and workload schedulers (LSF/Slurm)
Proven experience in automation and scripting (Ansible, Shell, Python and AI integration)
Hands-on experience managing large-scale HPC or EDA environments
Strong skills in performance tuning, capacity planning, and workload optimization
Excellent troubleshooting and problem-solving skills in complex production environments
Ability to lead projects end-to-end and work with cross-functional teams

Benefits

Remote work options

Additional Information

Role Overview Experienced Senior HPC Engineer / Architect specializing in Linux-based high-performance computing (HPC) environments , EDA workflows , and automation-driven infrastructure . Proven expertise in designing, managing, and optimising large-scale distributed HPC clusters supporting ASIC EDA workloads .

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Sandisk? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect