HPC Engineer
ExternalFull-timeOn-site1w ago
AnsibleAWSAzureBashCapacity PlanningDocumentation
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Architect, deploy, and manage large-scale distributed HPC environments across global locations, supporting ASIC and GPU compute clusters
- Design and implement infrastructure automation using Ansible, Shell, and Python for system lifecycle management
- Administer and optimize workload schedulers ( LSF, Slurm, NC ) including queue configuration, fair-share policies, and job prioritization
- Perform deep troubleshooting and root cause analysis across compute, storage, networking, and scheduler layers
- Collaborate with engineering teams to improve EDA workload performance and efficiency in global HPC environments
- Develop and deploy self-service automation solutions to reduce manual effort and improve system reliability
- Manage and support EDA ecosystem including tool deployment (Cadence, Synopsys), licensing, and workflow optimization
- Implement monitoring & observability frameworks using tools like Splunk, Grafana for proactive issue detection
- Drive capacity planning, performance tuning, and resource optimization for HPC workloads
- Create and maintain technical documentation, runbooks, and operational standards
- Provide technical leadership and mentoring , influencing HPC architecture and long-term strategy
- Techncal Skills
- HPC & Scheduling: LSF, Slurm, Network Computer (NC), Grid/Batch scheduling
- Operating Systems: RedHat Enterprise Linux (RHEL), CentOS
- Automation & Scripting: Ansible, Shell/Bash, Python
- EDA Tools: Cadence, Synopsys, EDA workflows & design environments
- Monitoring & Observability: Splunk, Grafana, Prometheus
- Storage & Filesystems: NFS, AutoFS, distributed storage systems
- Authentication & Access: UNIX/Linux integrated with Active Directory
- Infrastructure: On-premises & Hybrid HPC environments
- Remote Access & VDI: Exceed TurboX, VNC, nomachine
- Preferred Skills
- Extensive experience with job schedulers such as LSF, Slurm, or equivalent platforms
- Experience supporting EDA / semiconductor design environments
- Exposure to GPU computing and accelerator-based workloads
- Knowledge of EDA licensing systems and optimization
- Experience with Infrastructure as Code (IaC) and platform standardization
- Familiarity with cloud or hybrid HPC architectures (AWS/Azure HPC)
- Bachelor's degree in Computer Science, Engineering, or equivalent experience
- 8+ years of experience in Linux system administration (RHEL/CentOS)
- Strong expertise in HPC cluster management and workload schedulers (LSF/Slurm)
- Proven experience in automation and scripting (Ansible, Shell, Python and AI integration)
- Hands-on experience managing large-scale HPC or EDA environments
- Strong skills in performance tuning, capacity planning, and workload optimization
- Excellent troubleshooting and problem-solving skills in complex production environments
- Ability to lead projects end-to-end and work with cross-functional teams
Benefits
Remote work options
Additional Information
Role Overview Experienced Senior HPC Engineer / Architect specializing in Linux-based high-performance computing (HPC) environments , EDA workflows , and automation-driven infrastructure . Proven expertise in designing, managing, and optimising large-scale distributed HPC clusters supporting ASIC EDA workloads .
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Sandisk? Share your experience