Skip to main content
Back to jobs

Senior AI Data Infrastructure Engineer

External
xpengmotors logoXpengmotors · Santa Clara, CA
$124K–$210K/yrFull-timeOn-site1w ago
ApacheDeep LearningJavaLLMsMachine LearningPython
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Scalable Data Pipelines: Architect and build scalable, end-to-end pipelines to automate the ingestion, cleaning, and processing of PB-scale raw data for both production autonomy and multi-modal LLMs.
  • Modern Lakehouse Architecture: Evolve our data storage solutions based on Apache Iceberg and Lance to implement efficient semantic indexing, metadata management, and data versioning.
  • Training Throughput Optimization : Deeply optimize data loading and pre-fetching strategies to ensure maximum throughput for large-scale training on 10,000+ GPU clusters.
  • Infrastructure Evolution: Support the seamless transition of foundation model data into actionable training sets, bridging the gap between raw vehicle logs and model-ready tokens.

Requirements

  • Engineering Excellence: BS /MS/PhD in Computer Science or a related field, with a proven track record of building large-scale distributed systems.
  • Work Experience: 3-5 years of industry experience.
  • Programming Mastery: Proficient in Python, C++, or Java , with a deep understanding of high-performance concurrent programming and systems design.
  • Distributed Frameworks: Hands-on experience with at least one distributed processing framework, such as Ray and Spark .
  • Lakehouse Expertise: Familiarity with Data Lakehouse concepts and practical experience with technologies like Iceberg and Lance .
  • Experience building data warehouses for Trillion-token datasets or PB-scale multi-modal data.
  • Deep understanding of data access patterns in deep learning frameworks like PyTorch , DeepSpeed, or Megatron .
  • Practical experience with Vector Databases , automated labeling toolchains, or data-centric AI workflows.
  • Knowledge of storage formats optimized for AI (e.g., Parquet , Lance) and high-performance file systems.

Benefits

Equity / stock optionsPerformance bonus

Additional Information

XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity. As a core member of our AI Infrastructure team , you will work at the intersection of Autonomous Driving and Foundation Models. We don't just process EB-scale perception data from tens of thousands of production vehicles; we are building the high-performance Data Engine that powers our next-generation AI. Your work will directly determine how our self-driving systems "learn" from massive datasets and define the cognitive ceiling of multi-modal models in the physical world.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at xpengmotors? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect