Skip to main content
Back to jobs

Data Engineer II, QuBIT

External
Amazon.com Services LLC logoAmazon.com · Bellevue, WA
Full-timeOn-site3d ago
PythonJavaSQLAWSAgile
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Requirements

  • 3+ years of data engineering experience
  • Bachelor's degree or above in Computer Science, Computer Engineering, Data Science, Electrical Engineering, or majors relating to these fields, or 3+ years of professional software development experience
  • Experience with one or more object-oriented programming languages (e.g., Java, C/C++, Python)
  • Experience in data warehouse technical architectures, data modeling, infrastructure components, ETL/ ELT and reporting/analytic tools and environments, data structures and hands-on SQL coding
  • Experience with Redshift, Oracle, NoSQL etc.
  • Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
  • Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)
  • Knowledge of software engineering best practices across the development life cycle, including agile methodologies, coding standards, code reviews, source management, build processes, testing, and operations
  • Experience building/operating highly available, distributed systems of

Additional Information

What if AI could query your data warehouse and actually understand what the numbers mean, not just return rows? That's the infrastructure we're building, and we need a data engineer to help us scale it. We've built a semantic layer that sits between raw operational data and AI agents, encoding metric definitions, business logic, entity relationships, data lineage, and query routing into structured knowledge that large language models can consume and reason about. The foundation is in place. Now we need someone to deepen the data models, expand entity coverage, enrich the ontology with causal relationships, and build the pipeline infrastructure that keeps it all fresh and accurate at scale. In this role, you'll design and maintain the data infrastructure that powers AI-driven analytics for workforce Learning across Amazon's fulfillment network. That means building SQL pipelines in Redshift that process millions of daily records from nine upstream platforms, defining entity schemas with join keys, primary keys, and PII classifications, writing metric definitions with traceable formulas grounded in actual ETL logic, and modeling granularity levels that tell AI agents whether to query at the associate, site, or network level. You'll own the full stack from raw ingestion through transformation to semantic enrichment. You'll also work directly with business stakeholders to translate their domain expertise into structured metadata. When a Regional Learning Manager explains that "training compliance resets weekly on Sunday" or "this site type structurally can't meet that threshold," you'll encode that context into the semantic layer so AI agents handle it correctly without human intervention. Over time, you'll push this toward a world model: not just what metrics exist, but how they relate causally, what drives them, and what happens when they change. We're looking for someone who thinks about data infrastructure as more than pipelines and tables. You'll work with knowledge graphs, entity relationship modeling, YAML-based ontologies, vector embeddings for retrieval, and the prompt engineering that ties it all together. If you want to build the data systems that make AI genuinely useful for business decision-making, at Amazon scale, this is the role. Key job responsibilities - Design and maintain semantic layer infrastructure including entity schemas, metric definitions, data lineage, and query routing logic that enables AI agents to accurately query and interpret warehouse data - Build and optimize SQL pipelines in Redshift processing millions of daily records from multiple upstream platforms, ensuring freshness, accuracy, and traceability from source through transformation to consumption - Partner with business stakeholders to translate domain expertise and institutional knowledge into structured, machine-readable metadata that AI systems can reason about without human intervention - Expand data ontologies with causal relationships, temporal logic, and policy constraints that improve AI accuracy and enable increasingly autonomous data investigation - Interface with upstream data teams to extract, transform, and load data from diverse sources using SQL, Python, and AWS technologies, unifying disparate learning platforms into a coherent analytical layer - Maintain pipeline infrastructure that keeps semantic layer content synchronized with evolving ETL logic, detecting drift between metric definitions and underlying data structures - Continuously reduce manual analysis by building toward natural language interfaces where stakeholders get answers directly from AI - Explore emerging techniques in knowledge representation, retrieval-augmented generation, and semantic data modeling to deepen AI-powered analytics capabilities


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Amazon.com Services LLC? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect