Senior Distributed ML Engineer
ExternalFull-timeOn-site2mo ago
AWSAzureComplianceDeep LearningDockerGCP
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Collaborate with researchers to accelerate research, model training and inference, and facilitate the use of large-scale models in distributed computing environments.
- Investigate performance bottlenecks, profile research experiment code, debug reported issues, and optimize the utilization of computing resources.
- Develop tools and libraries to simplify and orchestrate the use of distributed computing resources for research experiments.
- Establish, document, and maintain best practices for large-scale, distributed ML model development workflows.
Requirements
- 3+ years of experience in designing and implementing distributed ML training frameworks, with recent experience using e.g. Megatron, DeepSpeed, HuggingFace Accelerate, FSDP, vLLM, and/or verl.
- Ability to collaborate effectively with cross-functional teams, document best practices, and stay updated with the latest advancements in ML and software development.
- Experience with cloud platforms (e.g., AWS, GCP, Azure) and workload managers (e.g., Ray, SLURM).
- Experience with GPU profiling tools (e.g. PyTorch profiler, PyProf, NVIDIA Nsight).
- Familiarity with containerization tools (e.g., gRPC, Docker, Kubernetes).
- Familiarity with data infrastructures and platforms (e.g., vector databases).
- A track record of contributing to high-quality research projects in deep learning.
- The title of Engineer is used for reference purposes and may or may not be the official title of the applicant based on jurisdiction.
Benefits
The opportunity to contribute to a unique mission with a major impact.Comprehensive health benefits (including mental health and wellness management account)20 days of vacation per year upon startEmployer contribution of 4% to your retirement savings, with no required employee matchingAdditional compensation totaling 8% of your salary to apply towards additional retirement savings or bonuses (independent of group and individual performance)A team of passionate world-class experts in their fieldA collaborative and inclusive work environment in our vibrant office space in the heart of Little Italy, in the trendy Mile-Ex district, close to public transportationAbout LawZeroYou belong hereAt LawZero, diversity is important to us. We value a work environment that is fair, open and respectful of differences. We welcome applications from highly qualified individuals interested in working towards our mission in a respectful, inclusive and collaborative setting.Health insurancePaid time offPerformance bonus
Additional Information
We are seeking a senior distributed machine learning (ML) research developer to join our team working on a novel AI safety agenda. In this role, you will work closely with ML research scientists to solve difficult training and inference problems using very large models.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at lawzero? Share your experience