Software Engineer, ML Platform
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are seeking a Software Engineer to join our Platform team to design, build, and deploy the AI infrastructure that powers our world-class research team. In this role, you'll collaborate closely with AI Scientists and other engineers to enable the effective use of thousands of GPUs for training and inferencing cutting-edge biological foundation models. This role spans a range of problems and skillsets, ranging from MLOps of cutting-edge GPU clusters, to backend engineering of control plane APIs. Our ideal candidate has an opinion about slurm or kubernetes for model training, cares about maximizing bandwidth from the storage subsystems to the GPU, and can build the API paved path for submitting training jobs that are able to dispatch to multiple clusters.
Responsibilities
- Develop and improve our model training system, responsible for dispatching distributed training jobs to clusters across multiple clouds.
- Deploy storage subsystems that improve dataset management and throughput for training datasets.
- Build evaluation infrastructure that enables easy execution and tracking.
- Build base tooling for integrating model training with other internal infrastructure, such as telemetry, experiment tracking, and checkpointing.
- Prior experience with biology is not required - we will teach what you need to know. You'll get to go in the lab, and for our ideal candidate, this should be a perk, not a chore!
- Preferred Skills and Qualifications
- Degree in Computer Science, Machine Learning, Computational Biology, or a related field.
- 5+ years of industry experience building and deploying ML systems in production environments
- Experience leading technical projects and driving cross-functional execution.
- Strong programming skills in Python.
- Experience with infrastructure/ops tools such as Terraform, Ansible.
- Experience with deep learning frameworks such as Torch, Jax.
- Solid understanding of machine learning
- Experience with the infrastructure needs of large-scale model training.
- Strong problem-solving skills and ability to work in a collaborative, multidisciplinary environment.
Benefits
Additional Information
About Xaira Therapeutics Xaira is an innovative biotech startup focused on leveraging AI to transform drug discovery and development. The company is leading the development of generative AI models to design protein and antibody therapeutics, enabling the creation of medicines against historically hard-to-drug molecular targets. It is also developing foundation models for biology and disease to enable better target elucidation and patient stratification. Collectively, these technologies aim to continually enable the identification of novel therapies and to improve success in drug development. Xaira is headquartered in the San Francisco Bay Area, Seattle, and London.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at xairatherapeutics? Share your experience