GCP Data Engineer - Java & PySpark

External

Synechron · Gurugram, India

Full-timeOn-site1w ago

AgileBigQueryCI/CDDocumentationGCPGit

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Requirements

Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field
4+ years of experience in data engineering, big data processing, or softwa

Additional Information

Job Summary Synechron is seeking a highly skilled GCP Data Engineer with expertise in Java and PySpark to lead the development of scalable data pipelines and microservices on Google Cloud Platform. In this role, you will collaborate with cross-functional teams to design, implement, and optimize data-driven solutions that support our global marketing and personalization initiatives. Your technical leadership will enable innovative cloud-native architectures that enhance business capabilities and drive operational efficiency. Software Requirements Required Software Proficiency: Java (preferably version 8 or newer) - extensive experience in developing backend services using Java and Spring Boot PySpark - proven experience in large-scale data processing within GCP environments SQL - hands-on experience with relational databases and query optimization techniques Google Cloud Platform (GCP): DataProc, BigQuery, GKE, Cloud Storage - solid working knowledge of deploying and managing data systems on GCP UNIX/Linux shell scripting, Python, Perl - practical knowledge in scripting for automation and data manipulation RESTful web services - experience in designing and implementing APIs for data exchange Version control: Git - strong familiarity with code management and collaboration workflows Preferred Software Skills: GCP-native tools: Dataflow, Cloud Composer, Cloud Storage - experience with fully managed cloud data workflows Hadoop, Hive, or additional big data processing tools - for data migration and legacy system integration Machine learning libraries: TensorFlow, Scikit-learn - knowledge for implementing AI/ML features Data streaming tools: Kafka - experience with real-time data pipelines and event processing Overall Responsibilities Design, develop, and optimize large-scale data pipelines leveraging PySpark, DataProc, and BigQuery, ensuring robustness and scalability Build and maintain microservices and APIs using Java (Spring Boot), deployed on GKE (Google Kubernetes Engine) Collaborate with product teams, data scientists, and stakeholders to analyze data requirements and translate into technical solutions Modernize architecture by migrating workloads from Hadoop, Spark, and Hive to GCP cloud infrastructure Conduct performance tuning, troubleshooting, and system optimization for data pipelines and services Write detailed technical documentation, including architecture diagrams, APIs, and deployment procedures Stay informed on emerging trends in cloud data engineering, AI/ML, and big data to apply innovations effectively Lead code reviews, enforce coding standards, and foster best practices across the team Technical Skills (By Category) Programming Languages: Essential: Java (Spring Boot), PySpark, SQL - proven ability to develop scalable backend and data processing solutions Preferred: Shell scripting, Perl, Python - for automation and scripting tasks Databases/Data Management: BigQuery, DataProc, Hadoop, Hive - experience in designing, managing, and optimizing large datasets and query performance Cloud Technologies: GCP (DataProc, BigQuery, GKE, Cloud Storage) - thorough knowledge of cloud deployment, data integration, and management Frameworks & Libraries: PySpark, Dataflow for data processing; REST API development with Spring Boot Development Tools & Methods: Git, CI/CD pipelines (Jenkins, Cloud Build), Agile/scrum practices for collaboration and continuous deployment Security & Data Governance: Understanding of data security, privacy standards, and best practices for cloud-based data platforms Experience Requirements 4+ years of professional experience in data engineering, software development, or related roles, preferably on cloud platforms Demonstrated experience developing and deploying scalable data pipelines and microservices in GCP environment Proven expertise in Java and PySpark for large data processing and system integration Familiarity with cloud migration projects, big data frameworks, and AI/ML libraries is a plus Effective in collaborative, agile team environments with a focus on delivering results Day-to-Day Activities Develop, test, and refine data pipelines using PySpark, DataProc, and BigQuery Design and implement scalable back-end microservices and APIs with Java Spring Boot, deployed on GKE Participate in Agile ceremonies, including sprint planning, stand-ups, and reviews Monitor system performance, troubleshoot issues, and optimize data workflows for efficiency and reliability Collaborate with data scientists, product managers, and analysts to deliver data-driven features Document system architecture, data models, and deployment instructions to ensure maintainability Continuously explore new cloud, big data, and AI/ML tools to improve infrastructure and solutions

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at synechron? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect