Infrastructure & Hpc Systems Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Benefits
Additional Information
Company Federal Reserve Bank of Philadelphia The Federal Reserve Bank of Philadelphia is one of the 12 regional Reserve Banks that, together with the Board of Governors in Washington, D.C., make up the Federal Reserve System. It helps formulate and implement monetary policy, supervises banks and bank and savings and loan holding companies, and provides financial services to depository institutions and the federal government. The Federal Reserve Bank of Philadelphia serves eastern and central Pennsylvania, southern New Jersey, and Delaware. You will ensure integrity, reliability, and availability of agile research computing environments by managing Windows/Linux server infrastructure, high-performance computing (HPC) clusters, and cloud/colocation/on-premises services. You will provide advanced specialized technical support to end users while developing automation tools and optimizing computational workflows to meet evolving rigorous research needs. You foster trust, open communication, shared goals and collaboration with stakeholders across the Federal Reserve System and externally. The salary grade for this position is 16. Final salary and offer will be determined by the applicant's background, experience and skills, as well as internal equity and alignment with market data . Job Description: Infrastructure & Operations You will respond to problems and maintains Windows and Linux server environments in research settings Design, deploy, configure, and administer HPC clusters and associated systems Monitor system health, performance metrics, and resource utilization to ensure optimal, efficient operation Implement robust security protocols and perform regular maintenance including upgrades and patching Manage job scheduling and workload optimization using tools like Slurm Support and troubleshoot user endpoints, servers, and services in various environments (i.e. cloud, on-premises, collocation) Participate in planning, budgeting, and monitoring of various environments Development & Automation You develop tools and scripts to automate management and creation of systems and services in various environments Create and maintain automation scripts to streamline system administration tasks Optimize scientific applications and computational workflows for performance Implement container technologies (Docker) for reproducible research Support GPU computing and accelerator technologies for specialized workloads Design and implement innovative HPC solutions to address evolving research requirements Define and track performance metrics to ensure efficient current and future use of resources End User Support & Technical Assistance You will respond to research end user requests to diagnose problems and provide specialized technical support Troubleshoot highly complex hardware and software issues in multi-user research environments Resolve problems quickly and accurately with thorough follow-up to ensure complete resolution Assist staff with IT-related problem resolution and use of facilities Partnership & Collaboration You partner closely with researchers to understand computational needs and translate them into technical solutions Collaborate with network, security, and data teams to ensure integrated operations Build and maintain relationships with vendors and technology partners Collaborate as technical advisor on infrastructure planning and technology roadmaps Participate in product and technology evaluations, testing, and pilot activities to provide sound recommendations Engage in Federal Reserve System, academic, and other HPC communities to stay current with emerging technologies and effective practices Documentation & Training Develop comprehensive documentation for systems, policies, and procedures Create user guides and training materials for researchers utilizing HPC resources Conduct workshops and training sessions on effective use of HPC resources and research computing tools Education and Experience: Bachelor's degree in computer science, engineering, mathematics, or related field, or equivalent combination of education and experience. Minimum of 5 years of relevant experience in HPC administration and systems engineering. Knowledge and Skills: You will have extensive experience with Linux operating systems (Red Hat/CentOS) in an HPC environment. Command line skills and proficiency in scripting languages (Python, Bash). Experience with job scheduling systems (SLURM) and resource management. Knowledge of parallel file systems and storage technologies (e.g. ceph, GPFS, Lustre, BeeGFS). Familiarity with parallel programming models (MPI, OpenMP) and scientific computing frameworks. Experience with configuration management and automation tools (Terraform). Demonstrated specialized problem-solving abilities and analytical thinking. Solid appreciation for research, sound judgment and healthy professional skepticism, understands sensitivities, considers big picture in addition to tactical details. Ability
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at rb? Share your experience