Senior ML Data Platform Developer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Design and maintain a layered storage architecture and partner with the Research team to ensure seamless integration with the training pipelines.
- Scale and automate the data processing stack to handle petabytes of data and ensure its smooth operation.
- Ensure efficient use of compute resources, including GPU access for compute-intensive data processing tasks.
- Assist the Infrastructure team in provisioning the compute and storage environments to support scaling.
- Ensure all datasets, including the intermediate outputs of each transformation stage, are versioned, reproducible, and fully traceable to meet specific and dynamic experiment needs, and are accompanied by datasheets, in accordance with internal Data Governance policies.
- Collaborate with the Research team and other teams to understand their self-service needs around dataset exploration, sampling, and analysis, and develop proper tooling.
Requirements
- A bachelor's degree in a relevant field (e.g., computer science, computer engineering, software engineering) is required.
- 5+ years of experience in designing, implementing, and managing web-scale storage, high-performance networking (HPC), or working within large-scale distributed ML data frameworks, with recent experience using e.g. Lustre, Ray, Apache Spark, workflow orchestrators, Apache Arrow, and/or Parquet.
- Ability to collaborate effectively with cross-functional teams, document best practices, and stay updated with the latest advancements in large-scale data processing and software development.
- Experience with workload managers (e.g., Ray, Kubernetes, SLURM).
- Familiarity with containerization tools (e.g., Docker, Enroot).
- Familiarity with data infrastructures and platforms (e.g., vector databases).
Benefits
Additional Information
We are seeking a visionary and highly technical Senior ML Data Platform Developer to architect, implement, scale, and maintain the data engine powering our next-generation frontier models. In this high-impact role, you will bridge the gap between cutting-edge AI research and high-performance engineering, treating the data platform as an internal product with our researchers as your primary customers. You will be responsible for designing a multi-tiered, ultra-low-latency storage architecture and building automated, petabyte-scale data processing pipelines. Our technical environment is not fixed and will evolve as our projects scale. We expect someone capable of evolving it, not only following industry trends, challenging it, and making sustainable decisions in close collaboration with our Research and Product teams.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at lawzero? Share your experience