Senior AI Data Infrastructure Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Scalable Data Pipelines: Architect and build scalable, end-to-end pipelines to automate the ingestion, cleaning, and processing of PB-scale raw data for both production autonomy and multi-modal LLMs.
- Modern Lakehouse Architecture: Evolve our data storage solutions based on Apache Iceberg and Lance to implement efficient semantic indexing, metadata management, and data versioning.
- Training Throughput Optimization : Deeply optimize data loading and pre-fetching strategies to ensure maximum throughput for large-scale training on 10,000+ GPU clusters.
- Infrastructure Evolution: Support the seamless transition of foundation model data into actionable training sets, bridging the gap between raw vehicle logs and model-ready tokens.
Requirements
- Engineering Excellence: BS /MS/PhD in Computer Science or a related field, with a proven track record of building large-scale distributed systems.
- Work Experience: 3-5 years of industry experience.
- Programming Mastery: Proficient in Python, C++, or Java , with a deep understanding of high-performance concurrent programming and systems design.
- Distributed Frameworks: Hands-on experience with at least one distributed processing framework, such as Ray and Spark .
- Lakehouse Expertise: Familiarity with Data Lakehouse concepts and practical experience with technologies like Iceberg and Lance .
- Experience building data warehouses for Trillion-token datasets or PB-scale multi-modal data.
- Deep understanding of data access patterns in deep learning frameworks like PyTorch , DeepSpeed, or Megatron .
- Practical experience with Vector Databases , automated labeling toolchains, or data-centric AI workflows.
- Knowledge of storage formats optimized for AI (e.g., Parquet , Lance) and high-performance file systems.
Benefits
Additional Information
XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity. As a core member of our AI Infrastructure team , you will work at the intersection of Autonomous Driving and Foundation Models. We don't just process EB-scale perception data from tens of thousands of production vehicles; we are building the high-performance Data Engine that powers our next-generation AI. Your work will directly determine how our self-driving systems "learn" from massive datasets and define the cognitive ceiling of multi-modal models in the physical world.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at xpengmotors? Share your experience