Design, build, and maintain scalable batch and streaming data pipelines using Apache Spark and cloud‑native data technologies.
Develop and optimize ETL/ELT workflows to ingest, transform, and curate data from diverse source systems into analytics‑ready datasets.
Implement data modeling and transformation logic to support reporting, dashboards, and downstream analytical and machine learning workloads.
Build and manage data processing workloads within modern lakehouse platforms, including Microsoft Fabric / OneLake (preferred).
Ensure data quality, reliability, and consistency by implementing validation checks, monitoring, and reconciliation processes.
Optimize Spark jobs for performance, cost efficiency, and scalability across large and complex datasets.
Manage and evolve data schemas while handling schema drift and upstream source changes.
Develop reusable frameworks, libraries, and standardized patterns to improve data engineering productivity and consistency.
Implement CI/CD pipelines for data workloads to enable automated testing, deployment, and rollback.
Monitor data pipelines and jobs, troubleshoot failures, and resolve performance or data quality issues.
Partner with analytics engineers, BI developers, and data scientists to understand data requirements and deliver curated datasets.
Collaborate with platform, security, and governance teams to ensure data security, compliance, and proper access controls.
Contribute to Agile delivery processes, including sprint planning, design reviews, and continuous improvement initiatives.
Required Qualifications
Strong experience as a Data Engineer building and operating production data pipelines.
Hands‑on experience with Apache Spark for large‑scale data processing.
Proficiency in Python , SQL , and data transformation best practices.
Experience with cloud‑based data platforms and storage (e.g., Data Lakes, Lakehouse architectures).
Familiarity with Microsoft Fabric , OneLake , or similar analytics platforms (strong plus).
Experience designing and optimizing data models for analytical workloads.
Understanding of distributed data processing concepts, performance tuning, and fault tolerance.
Experience with CI/CD, version control, and infrastructure‑as‑code concepts.
Strong problem‑solving skills and ability to troubleshoot complex data issues.
Excellent communication skills and ability to collaborate across technical and non‑technical teams.
4+ years of experience in data engineering or equivalent role (preferred).
Preferred / Nice‑to‑Have Skills
Experience with real‑time or event‑driven data processing.
Familiarity with data governance, metadata management, and data quality frameworks.
Exposure to orchestration tools and workflow management systems.
Experience supporting analytical, reporting, or machine learning use cases.
Primary Skills
Apache Spark, Data Engineering, Data ETL, ETL Tools, Python (Programming Language)
Shift Time
Recruiter Info
Hiral Parag Rughani
hparb@allstate.com
About Allstate
The Allstate Corporation is one of the largest publicly held insurance providers in the United States. Ranked No. 84 in the 2023 Fortune 500 list of the largest United States corporations by total revenue, The Allstate Corporation owns and ope
Additional Information
At Allstate, great things happen when our people work together to protect families and their belongings from life's uncertainties. And for more than 90 years, our innovative drive has kept us a step ahead of our customers' evolving needs. From advocating for seat belts, air bags and graduated driving laws, to being an industry leader in pricing sophistication, telematics, and, more recently, device and identity protection.
Job Description
Allstate's Data & Analytics Technology organization is seeking a Data Engineer to design, build, and operate scalable, reliable, and high performing data pipelines that support enterprise analytics, reporting, and advanced data use cases. In this role, you will focus on building robust batch and streaming data solutions using Apache Spark and modern cloud data platforms. Experience with Microsoft Fabric is a strong plus.
You will work closely with analytics engineers, data scientists, product teams, and platform partners to transform raw, complex data into trusted, analytics ready datasets. This role plays a critical part in enabling data driven decision making by ensuring data quality, performance, scalability, and operational excellence across the data platform.
Exp- 6 to 10Yrs
Primary Skill- Data Engineer, Apache Spark, ETL, Python
Location- Bangalore
Shift - 1pm to 9.30pm
Work from office (Hybrid)