Staff Data Engineer - TS/SCI Cleared

External

Twenty · Arlington, VA

Full-timeOn-site8mo ago

AirflowApacheAWSBigQueryClusteringData Modeling

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

America is under sustained cyber attack. Our adversaries infiltrate our networks, steal our IP, and degrade the digital infrastructure that modern life runs on. They've learned-correctly-that those attacks rarely produce consequences. Twenty was founded to change that, by making our adversaries think twice before they attack us. Our vision is American and allied primacy in cyberspace-a future where they cannot contest us, deterrence is assured, and the free world remains secure. Founded in 2024, Twenty Technologies ( www.twenty.io ) industrializes offensive cyber operations for the U.S. and its allies. Headquartered in Arlington, Virginia, Twenty has raised $38M from Caffeinated Capital, General Catalyst, and In-Q-Tel. Role Summary You will own the data infrastructure that powers Twenty's cyber operations applications and capabilities. This role is about building a durable, high-performance data lake and the pipelines, schemas, and query patterns that make petabyte-scale datasets usable and economical. You'll partner closely with engineers and intelligence analysts to turn messy, high-volume operational data into reliable, well-modeled systems that drive real missions. You'll also lead technical initiatives and mentor other engineers as we scale what we can support and ship.

Responsibilities

Lead the development and operation of a data lake for cyber operations and intelligence data.
Design schemas, partitions, and indexes that make complex datasets performant and cost-effective to query.
Partner with engineers and intelligence analysts to define query patterns and data products for mission use cases.
Build and evolve ETL pipelines that are observable, recoverable, and resilient to upstream change.
Drive technical initiatives end-to-end, from architecture decisions through production rollout and iteration.
Establish best practices for data quality, documentation, and operational ownership across the platform.
Mentor engineers on data modeling, performance tuning, and production-grade pipeline design.
Identify bottlenecks in storage/compute/query layers and ship improvements with clear performance wins.

Requirements

You think in systems: data modeling, storage formats, compute engines, and access patterns all have to fit together.
You're opinionated about schema and index design, and you can explain tradeoffs clearly.
You default to measurable reliability: data quality, lineage, repeatability, and operational excellence.
You're comfortable working with ambiguous datasets and evolving requirements without lowering standards.
You collaborate tightly across roles, especially with engineers and analysts who need fast, correct answers.
You take leadership seriously-mentoring others, raising the bar, and driving initiatives to completion.
You're motivated by national security outcomes and want your work to matter in the real world.
You have 8+ years of experience in data engineering and/or data architecture.
You have mastery-level expertise building ETL pipelines and operating them in production.
You have deep experience with data lake architecture and systems used to query data lakes.
You have strong schema and index design skills, including partitioning, indexing, and clustering strategies.
You have experience with column-oriented databases in production environments.
You have built data systems from scratch (not only maintained existing platforms).
You have proven leadership experience mentoring engineers and driving technical initiatives.
You are a U.S. citizen and can meet the role's security requirements.
You have experience with key-value datastores.
You have worked with streaming and message queue systems.
You have experience with graph database technologies.
You have worked with internet/networking datasets (e.g., scan data, DNS, netflow, certificates).
You have experience supporting analysts or operational users with high-stakes data needs.
Tech Environment (You Might Work With)
Data lakes: Apache Iceberg, Delta Lake, Apache Hive
Query engines: Trino, Presto, AWS Athena, Apache Spark
Column stores: ClickHouse, Amazon Redshift, Google BigQuery
ETL / orchestration: Airflow, AWS Glue, NiFi, ClickPipe
Streaming / queues: Kafka, RabbitMQ, NATS, AWS Kinesis
Graph: Neo4j, AWS Neptune, Memgraph, Apache AGE
Security / Work Environment
This role requires an active TS/SCI security clearance with appropriate polygraph and the ability to maintain it. This role is on-site in Arlington, VA with occasional travel to Fort Meade, MD.

Benefits

What's on the table:Health. Medical, dental, and vision plan options. Life / AD&D, disability coverage options.Family. Paid parental leave for eligible full-time employees. 12 weeks for birthing parents, 4 for non-birthing parents, 6 weeks for adoptive, foster, or intended parents through surrogacy.Vacation. Paid holidays and flexible PTO. Take what you need.Retirement. 401(k) with pre-tax and Roth optioHealth insuranceDental insuranceVision insurance401(k)Paid time offFlexible scheduleParental leave

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at twenty? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect