Data Architecture Design & Modernization: Design target data architectures for OSS lift-and-shift and cloud-native modernization scenarios. Make architectural decisions balancing performance, cost, scalability, and operational excellence. Design for multi-cloud flexibility where applicable.
Data Platform Migration Strategy: Own migration approach for big data platforms including HDFS to GCS migration, Hadoop ecosystem modernization, and data warehouse transformations. Establish phased migration patterns and validation strategies.
GCP Data Platform Architecture: Design BigQuery, Dataplex, BigLake, and data lakehouse architectures. Establish data organization patterns, access controls, and metadata management. Optimize for cost, query performance, and data discovery.
Big Data Processing Architecture: Design Spark, Flink, and Dataflow pipelines for batch and streaming data processing. Establish processing patterns, optimization strategies, and cost management approaches. Design for scale, fault tolerance, and performance.
Data Orchestration & Workflow Design: Design Cloud Composer and Airflow DAGs for complex data workflows. Establish orchestration patterns, error handling, monitoring, and retry strategies. Ensure reliable, maintainable workflow execution.
Data Governance & Metadata Management: Design data governance frameworks including data ownership, access controls, and metadata standards. Establish data lineage tracking, data catalogs, and governance policies. Guide governance implementation across platforms.
Data Quality & Validation: Establish data quality frameworks and validation strategies for migrations. Define quality rules, reconciliation criteria, and acceptance thresholds. Implement data quality monitoring and alerting.
Hive Metastore & Metadata Architecture: Design migration strategies for Hive Metastore to BigLake Metastore/Dataplex. Manage table schema migration, partition strategy optimization, and metadata preservation. Ensure compatibility with existing tools.
Query Engine Architecture & Trino Integration: Design query engine architectures supporting multiple data sources. Establish Trino/Presto configurations for federated query access. Ensure compatibility with existing SQL tools and applications.
Streaming Data Architecture: Design Kafka-based streaming architectures for real-time data ingestion. Establish Kafka topics, partition strategies, and consumer patterns. Design streaming pipelines using Dataflow or Flink.
Data Ingestion & ELT Pipeline Design: Design scalable data ingestion patterns for structured and unstructured data. Establish ELT/ETL frameworks optimized for cloud platforms. Design connectors and integrations with source systems.
Storage Architecture & Optimization: Design GCS-based storage architectures for data lakes. Establish data organization, partitioning, and lifecycle policies. Optimize storage costs and access patterns.
Data Lakehouse Architecture: Design modern lakehouse architectures combining data lake and warehouse capabilities. Establish table formats, schema management, and ACID transaction support. Design for analytics and ML workloads.
Technical Leadership & Mentoring: Provide technical leadership to data engineering teams. Mentor engineers on data architecture patterns, best practices, and technical decision-making. Conduct design reviews and architecture discussions.
Code Quality & Best Practices: Establish code quality standards, design patterns, and testing frameworks. Review data pipeline code and designs. Drive adoption of best practices for maintainability, testability, and performance.
Performance Optimization & Cost Management: Optimize data pipelines for performance and cost. Profile and tune Spark, Dataflow, and BigQuery workloads. Establish cost monitoring and optimization practices acro
Additional Information
Make an impact with NTT DATA
Join a company that is pushing the boundaries of what is possible. We are renowned for our technical excellence and leading innovations, and for making a difference to our clients and society. Our workplace embraces diversity and inclusion - it's a place where you can grow, belong and thrive.
Position Summary
We are seeking a Lead Data Engineer with 5+ years of big data and cloud data architecture expertise to design and modernize data platforms for large-scale transformation programs. This role combines deep technical hands-on expertise with architectural leadership, requiring mastery of big data frameworks (Spark, Flink, Kafka), cloud data platforms (BigQuery, Dataplex, Dataproc, Dataflow), and modern data stack technologies (Trino, Cloud Composer/Airflow, BigLake). The successful candidate will own data architecture design for both lift-and-shift migrations and cloud-native modernization, establish data governance frameworks, and provide technical leadership to engineering teams. You will guide data platform migrations, ensure data quality and validation, and mentor engineers on best practices.