Staff Platform Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Responsible for design and implementation of top priority technical tasks and timely delivery of such tasks, meeting and helping define the required level of quality.
- Participation in complex and challenging activities, including design and implementation.
- Responsible for a scope of significant size critical to the team's success.
- Provide technical leadership and effectively mentor interns and less experienced members.
- Actively participate in the hiring process and provide fair and productive interview feedback.
- Negotiate complex decisions, present options and persuasively advocate for optimal technical solutions, internally and externally.
- Utilizes technical knowledge of existing and emerging technologies, including public cloud offerings from Amazon Web Services, Microsoft Azure, and Google Cloud.
- Acts as a technical consultant and resource for faculty research, teaching, and/or administrative projects.
- Performs other related work as needed.
Requirements
- Education:
- Minimum requirements include a college or university degree in related field.
- Work Experience:
- Minimum requirements include knowledge and skills developed through 5-7 years of work experience in a related job discipline.
- Certifications:
- ---
- Advanced degree in computer science, mathematics, statistics, engineering, or a relevant quantitative field strongly preferred.
- 6+ years professional experience as a system or DevOps engineer or demonstrated skills and qualifications through projects, initiatives, or outstanding performance.
- Hands-on scripting experience (Bash, Python, or other dynamic language).
- Unix/Linux programming or system administration experience.
- Experience with OpenStack and AWS p(EC2/S3) cloud technologies.
- Experience with configuration management utility (Chef, Puppet, Ansible).
- Experience with F5 or other load balancing technologies (Nginx, AWS ELB/ALB, etc.).
- Experience with source control and build systems (SVN, Git, Jenkins, etc.).
- Experience with container based deployment (Docker, Kubernetes).
- Experience with log aggregation tools (ELK stack, Splunk).
- Experience with security frameworks (FISMA, NIST, FIPS).
- Experience with cloud platforms (AWS, GCP, Openstack), CI/CD, and Agile methodologies.
- Experience leading DevOps initiatives and process improvement.
- Experience with deploying, maintaining, and monitoring AI/ML models and infrastructure.
- Preferred Competencies
- Ability to collaborate with team members and help define guidelines and best practices and ensuring accountability for deliverables and outcomes.
- Ability to take and provide constructive and helpful input and feedback on technical issues.
- Ability to negotiate complex decisions, present options and persuasively advocate for optimal technical solutions, internally and externally.
- In-depth knowledge in most technical ar
Benefits
Additional Information
Department BSD CTD - Platform Engineering - GDC About the Department The Center for Translational Data Science (CTDS) at the University of Chicago is a research center whose mission is to develop the discipline of translational data science to impactful problems in biology, medicine, healthcare, and the environment. We envision a world in which researchers have ready access to the data needed and the tools required to make data driven discoveries that increase our scientific knowledge and improve the quality of life. We architect ecosystems of large-scale commons of research data, computing resources, applications, tools, and services for the broader research community to use data at scale to pursue scientific inquiry and accelerate discovery. Learn more at https://gdc.cancer.gov/, https://gen3.org/, https://stats.gen3.org/, and https://ctds.uchicago.edu/. Job Summary Platform (aka DevOps) Engineers provide production support, production monitoring, CI/CD design & implementation, security automation, & AI/ML infrastructure management across the open-source software platforms CTDS develops and operates for translational data science. Production support includes triaging, researching, communicating, and addressing production incidents. For monitoring, staff wrangle disparate system monitoring assets and develop common analytics to inform optimization define benchmarks and confidence intervals and to forecast, proactively mitigating production incidents. CI/CD pipelines are for hybrid cloud architecture on-premises and in commercial cloud providers like Amazon, Google, and Microsoft. Additionally, the position is responsible for AI/ML research infrastructure, including managing and optimizing on-premises GPU resources and AWS cloud services such as Bedrock and SageMaker. Responsibilities include deploying, monitoring, and maintaining machine learning models for inference, optimizing model and hardware performance, troubleshooting AI/ML solutions, and integrating them within the broader application environment to support research and production workflows.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at uchicago? Share your experience