Media Data Engineer (Databricks)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Join our dynamic Central Europe Backend Engineering team as a Media Data Engineer and play a crucial role in shaping our media analytics capabilities! We're seeking a highly skilled and hands-on professional, proficient in Databricks and data engineering, to design, develop, and implement robust data pipelines, specifically processing campaign data from social media platforms like TikTok, YouTube, and Instagram across Central Europe. This is a high-impact opportunity to contribute to a significant scope: powering media digital products and serving 700 users across the region. Your work will directly enable insightful analytics, drive data-driven decision-making, and help us continuously innovate and optimize our media strategies. If you thrive in a fast-paced environment, love solving complex data challenges, and are passionate about building scalable, efficient data solutions, we encourage you to apply!
Responsibilities
- As a Media Data Engineer, you will:
- Develop Core Data Solutions: Design and develop high-quality code within Databricks (leveraging PySpark notebooks and SQL) to meet specific business requirements related to media campaign data, comprising at least 70% of your primary responsibilities.
- Accelerate Development: Utilize existing AI capabilities, such as GitHub Copilot or internal tools like BMAD, to enhance productivity and accelerate development cycles.
- Data Set Assembly: Assemble and prepare large, complex datasets from social media sources, ensuring they meet critical functional and non-functional business requirements for diverse analytics applications.
- Architectural Collaboration: Partner with data asset managers, architects, and development leads to ensure all technical data solutions are fit for purpose, align with architectural blueprints, and deliver high-quality, reliable media campaign data.
- Maintain Standards: Contribute to and actively leverage established coding standards and best practices, ensuring that all services and components are efficient, scalable, and reusable.
- Cross-Functional Partnership: Collaborate effectively with front-end teams, embracing a "data as a product" mindset to ensure seamless data delivery and integration for media insights.
- Adhere to Best Practices: Consistently apply sound development practices and adhere to agreed-upon architectural designs throughout the development lifecycle on Google Cloud Platform.
- Technical Debt Reduction: Proactively identify and define infrastructure revamp initiatives aimed at reducing technical debt and enhancing system longevity.
- Agile Delivery: As an integral member of a Scrum team, deliver data engineering projects efficiently and in alignment with business priorities and agile methodologies.
- Operational Support: Provide timely L3 support for existing data processes, thoroughly analyzing bugs and incidents to ensure system stability and performance of media data pipelines.
- Process Improvement: Identify, design, and implement continuous internal process improvements to streamline and automate backend operations.
- Continuous Learning: Stay abreast of industry trends, emerging technologies, and best practices in data engineering and media analytics, applying this knowledge to drive innovation, foster improvement, and contribute to team-wide knowledge sharing initiatives.
- Job Qualifications
- We are looking for a candidate with:
- PySpark Expertise: Strong proficiency in PySpark for efficient data processing, transformation, and analysis.
- Databricks Proficiency: Proven hands-on experience with Databricks, including cluster management, notebook development, and job scheduling.
- SQL Mastery: Advanced proficiency in SQL for complex data manipulation, querying, and performance tuning.
- Google Cloud Platform (GCP) Expertise: Demonstrable hands-on experience and strong understanding of GCP services relevant to data engineering (e.g., BigQuery, Cloud Storage, Dataflow, Pub/Sub).
- Data Pipeline Experience: Solid experience in designing, implementing, and optimizing robust data pipelines and ETL/ELT processes using PySpark, Databricks, and GCP data services.
- Data Modeling Knowledge: Familiarity with data modeling, data warehousing concepts, and dimensional modeling techniques.
- Data Architecture Understanding: A clear understanding of data integration patterns, data lake architectures, and best practices for ensuring data quality.
- Social Media Data Experience: Experience working with or integrating data from social media platforms (e.g., TikTok, YouTube, Instagram) is highly desirable.
- Workflow Orchestration (Plus): Experience with Databricks Workflow management and the orchestration of data pipelines (e.g., using Airflow on GCP) is a significant advantage.
Benefits
Additional Information
Job Location WARSAW PLANT & GO Job Description
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Procter & Gamble? Share your experience