Senior Data Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Senior Data Engineer with strong expertise across traditional big‑data platforms (Hadoop ecosystem) and modern cloud-native architectures (AWS). Responsible for building scalable, secure, and high‑performance data pipelines that span Hadoop clusters and AWS cloud services. Leverages deep knowledge of distributed systems, Spark optimization, cloud automation, and big‑data management to support analytics, BI, ML, and AI use cases across the enterprise. Ensures reliability, governance, cost-efficiency, and operational excellence across hybrid data platforms. Associate should be self-driven, can work with minimal guidance and guide the team technically. Core Responsibilities Design, build, and maintain high‑volume ETL/ELT pipelines across Hadoop (HDFS, Hive, Spark, Kafka) and AWS (Glue, EMR, Lambda, Step Functions, Redshift) . Develop distributed data processing solutions using PySpark, Spark SQL , and scalable cloud serverless patterns. Implement reusable data ingestion frameworks for batch (Sqoop, Hive, Spark) and streaming (Kafka, Kinesis). Optimize data workflows using partitioning, bucketing, compression, file formats (Parquet/ORC). Understanding hybrid data lake architectures using S3 + HDFS , ensuring governance consistency (Atlas, Ranger, Lake Formation). Understanding the reporting requirements and perform data profiling and create design for same. Create data flow diagram and do data modelling. Job orchestration using Airflow, Control‑M, Step Functions , or event-driven triggers. Understand auto-scaling, capacity planning, and performance tuning on EMR and Spark clusters. Ensure data is protected and compliant with regulatory standards. Work closely with business stakeholders to enable high‑quality datasets. Provide technical leadership in architecture decisions, code reviews, and best‑practice adoption and provide technical guidance to peers/juniors in team. Improve reliability, scalability, and performance through automation, autoscaling, and capacity planning. Own deployment, incident response, and post-incident reviews for production environments, troubleshooting Spark performance issues, job failures, and cluster bottlenecks. Understanding security best practices (IAM, KMS, security groups, WAF, parameter/secret management). Optimize cost and usage of AWS resources and recommend architecture improvements. Collaborate closely with developers, QA, and product teams to streamline release processes.