Skip to main content
Back to jobs

Senior Storage System Software Developer

External
Llnl logoLlnl · Livermore, CA
Full-timeOn-site3d ago
GitLeadershipLinuxPython
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Benefits

Vision insurance

Additional Information

We have an opening for a Senior Storage System Software Developer on a team that researches, designs, develops, maintains, and integrates software and hardware solutions that underpin scalable storage services within the Livermore Computing high-performance computing center. In this role, you will apply software development experience and broad systems-level mastery to support production parallel file systems and archival storage systems: to include troubleshooting, debugging, assistance to system administration staff to isolate software defects on production systems, and independent software development to address challenging issues on large-scale systems. Additionally, this position includes opportunities to implement new software features in archival storage and file systems such as High-Performance Storage System (HPSS), Lustre, and ZFS. This position is in the Livermore Computing Division within the Computing Principal Associate Directorate. This position offers a hybrid schedule, blending in-person and virtual presence. You will have the flexibility to work from home up to two days per week. This position will be filled at the SES.3/SES.4 level and will be filled at either level based on knowledge and related experience as assessed by the hiring team. Additional job responsibilities (outlined below) will be assigned if hired at the higher level. You will Provide software engineering support for production file systems and/or long-term archival storage systems running at petabyte and billion-object scale. Troubleshoot and debug highly scalable software-defined storage systems such as Lustre and/or HPSS. Contribute to long-term maintenance of HPSS and/or Lustre codebases along with related projects (ZFS, Lustre Monitoring Tools, storage quota systems, etc.). Design, implement, and maintain new features and performance improvements for HPSS and/or Lustre (and related projects). Review colleagues' code changes and integrate upstream patches into local versions of Lustre and/or HPSS codebases. Develop and refine storage system monitoring applications. Collaborate with cross-functional teams and across organizations to implement innovative solutions and/or resolve system-wide performance degradations and functionality defects in production storage systems. Perform other duties as assigned. Additionally at the SES.4 level, you will Serve as a technical subject matter expert and provide technical leadership for complex storage software and systems efforts. Mentor and develop technical staff across the organization and share expertise broadly with the next generation of storage professionals. Lead cross-functional efforts to diagnose and resolve critical system-wide performance, scalability, and reliability issues. Identify and integrate innovative approaches using new technologies, articulating alternative solutions and their impacts. Provide strategic technical guidance to project stakeholders, management and partner organizations. Ability to maintain a U.S. DOE Q-level security clearance which requires U.S. citizenship. Bachelor's degree in computer science or related field or the equivalent combination of education and related experience. Significant experience with file system internals and/or with hierarchical storage system concepts and systems - including tiered storage systems that integrate flash, HDD, and tape - used to implement long-term archival storage systems. Significant experience in a production high performance computing environment. Experience operating storage systems in a production high performance computing (HPC) environment where unplanned downtime has significant operational consequences and end-user impact. Advanced proficiency developing software in a team environment with two or more of the following programming languages: C, C++, Rust, or Python. Proficiency in Linux command line environments. Proficiency with distributed version control software (example: git). Advanced verbal and written communication skills necessary to effectively collaborate in a team environment and present and explain technical information and provide advice to management. Proficiency with Linux debugging and inspection tools (examples: strace, perf, gdb, the /proc file system, and/or eBPF) Availability to work off-hours to resolve production problems, on an as-needed basis, and on a 24×7 on-call rotational schedule. In Addition, at the SES.4 Level Highly advanced knowledge of and significant technical experience with Linux operating systems and/or Linux kernel interfaces, as well as experience with configuration, networking, and system security. Advanced problem-solving and debugging skills to diagnose multi-component problems in highly parallel, multi-threaded systems and identify the root cause of complex storage system issues. Proficient verbal, written, and interpersonal communication skills necessary to interact with all levels of personnel and effectively collaborate in a multi-disciplinary team en


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Llnl? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect