Data Engineer - Hadoop, Spark & Cloud Platforms

External

Synechron · Mumbai, India

Full-timeOn-siteToday

AnsibleApacheAWSAzureBigQueryCassandra

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Requirements

Bachelor's or Master's degree in Computer Science, Data Engineering, or related field
4+ years managing large-scale data pipelines, preferably in cloud environments
Experience with Hadoop ecosystem , Spark , and ETL tools in enterprise settings

Additional Information

Job Summary Synechron is seeking a skilled ETL Developer with strong expertise in Hadoop ecosystems, Spark, and Informatica to design, develop, and maintain scalable data pipelines supporting enterprise analytics and data warehousing initiatives. This role involves working on large datasets, transforming data, and delivering reliable data integration solutions across on-premise and cloud environments. Your efforts will enable data-driven decision-making, ensure data quality, and support our organization's strategic focus on scalable and compliant data platforms. Software Requirements Required: Hands-on experience with ETL tools : Informatica , Talend , or equivalent (5+ years) Proven expertise in Hadoop ecosystem components: HDFS , Hive , Pig , Sqoop (5+ years) Proficiency in Apache Spark : PySpark , Spark SQL , Spark Streaming Strong programming skills in Python , Java , or Scala for data processing (5+ years) Experience with SQL and relational databases: Oracle , MySQL , PostgreSQL Familiarity with cloud data platforms such as AWS Redshift , Azure Synapse , GCP BigQuery Preferred: Knowledge of cloud-native data migration and integration tools Exposure to NoSQL databases like DynamoDB or Cassandra Experience with data governance and metadata management tools Overall Responsibilities Design, develop, and optimize end-to-end ETL pipelines for large-scale data processing and integrations Build and enhance batch and real-time data processing workflows using Spark, Hadoop, and cloud services Convert business and technical requirements into high-performance data solutions aligned with governance standards Perform performance tuning , debugging, and optimization of data workflows and processing jobs Ensure data quality, security, and compliance with enterprise standards and industry regulations Collaborate with data analysts, data scientists, and application teams to maximize data usability and accuracy Automate data ingestion, transformation, and deployment pipelines for operational efficiency Support platform stability by troubleshooting issues, monitoring workflows, and maintaining data lineage Implement and improve data governance , metadata management, and security standards Stay current with emerging data technologies, automation frameworks, and cloud innovations to optimize data architectures Technical Skills (By Category) Programming Languages (Essential): Python, Scala, Java (for data processing and automation) Preferred: Additional scripting or programming skills (Shell, SQL scripting) Frameworks & Libraries: Spark (PySpark, Spark SQL, Spark Streaming), Hive, Pig Data validation and governance tools (e.g., Atlas, Data Catalogs) AI/ML frameworks such as LangChain, Hugging Face (preferred) Databases & Storage: Relational: Oracle, PostgreSQL, MySQL NoSQL: DynamoDB, Cassandra (preferred) Cloud Technologies: AWS: EMR, S3, Glue, CloudFormation, CDK, Redshift (preferred) Azure or GCP data services (desired) Data Management & Governance: Metadata management, data lineage, data quality frameworks DevOps & Automation: CI/CD tools: Jenkins, GitHub Actions, TeamCity Infrastructure as Code: Terraform, CloudFormation, Ansible Experience Requirements 4+ years of experience in designing and developing large-scale data pipelines Proven expertise with Hadoop , Spark , and ETL frameworks in enterprise environments Hands-on experience integrating data within cloud ecosystems and maintaining data quality Familiarity with regulated industries such as finance or banking is preferred Demonstrated ability to troubleshoot performance issues and optimize workflows Day-to-Day Activities Develop and maintain data pipelines supporting enterprise analytics and reporting Optimize ETL workflows for performance, scalability, and data accuracy Collaborate across teams to understand data requirements and implement technical solutions Automate data processes and manage infrastructure provisioning using IaC tools Monitor data processing jobs, troubleshoot incidents, and perform root cause analysis Maintain documentation for data lineage, workflow configurations, and data security Support migration and platform upgrade projects ensuring minimal disruption Stay updated on new data processing tools, cloud architecture, and compliance standards

Data Engineer - Hadoop, Spark & Cloud Platforms

Requirements

Benefits

Additional Information

Your Match

Company Intel

What employees say

Interested in this role?