Staff Engineer, CI/CD & Cloud Infrastructure
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- CI/CD & Build Engineering
- Design, build, and maintain CI/CD pipelines using GitHub Actions or similar platforms
- Manage build systems for Python, C/C++, and CUDA codebases on Linux
- Integrate build tools (CMake, Make, pip, setuptools) into automated pipelines
- Implement robust versioning, tagging, and artifact management strategies
- Ensure full traceability of builds, test results, and artifacts from commit to deployment
- Manage Docker-based build environments including base images, caching, and reproducibility
- Maintain and optimize build performance, parallelism, and reliability
- Cloud Infrastructure (AWS)
- Architect and manage complex AWS infrastructure including:
- IAM roles, policies, and access management
- Storage services (S3, EBS, EFS) with tiered lifecycle policies
- Databases (RDS, DynamoDB, or similar) with backup and
- failover strategies
- Data workflow and pipeline engines (Step Functions, Airflow, or
- similar)
- Compute services (EC2, ECS, EKS, Lambda) scaled to workload
Requirements
- Implement infrastructure as code using Terraform
- Manage Kubernetes clusters and Helm charts for containerized
- workloads
- Design for scalability, high availability, and disaster recovery
- Manage cost optimization, resource tagging, and infrastructure
- governance
- Support multi-account and multi-region strategies as needed
- Familiarity with Azure and GCP for secondary or hybrid
- On-Premises HPC & Hybrid Infrastructure
- Provision, configure, and manage on-premises Linux HPC nodes used for secondary and tertiary data processing
- Define infrastructure-as-code (Terraform, Ansible, or similar) for reproducible HPC node provisioning and configuration
- Manage high-speed networking infrastructure between instruments, HPC nodes, and storage (configuration, monitoring, troubleshooting)
- Implement and manage shared storage systems (NFS, parallel filesystems, or similar) accessible to both local HPC and cloud compute
- Design and operate hybrid burst-to-cloud infrastructure - provision and manage AWS compute resources that extend local HPC capacity on demand
- Collaborate with the data pipeline team to ensure infrastructure meets throughput, latency, and reliability requirements
- Manage OS patching, driver updates, and GPU runtime environments across HPC nodes
- Monitor HPC cluster health, utilization, and capacity to inform scaling decisions
- Experiment Data Management & Pipelines
- Design and operate data ingestion pipelines for high-volume experiment data from lab instruments
- Implement tiered storage strategies (hot/warm/cold) to balance accessibility, performance, and cost
- Deploy and manage search infrastructure (Elasticsearch/ OpenSearch) to make experiment data universally discoverable and queryable
- Build data cataloging and metadata tagging systems so datasets are well-organized and self-describing
- Integrate visualization tools (Grafana, Kibana, or similar) to enable engineers and scientists to explore and analyze experiment data
- Design data lifecycle policies including retention, archival, and compliance requirements
- Ensure data pipelines are reliable, idempotent, and observable with clear error handling and retry logic
- Work with engineering and science teams to define data schemas, access patterns, and query requirements
- Deployment & Release Engineering
- Own deployment workflows for software delivered to embedded instruments in our central lab
- Manage release processes for a small number of complex, high- value lab-operated instruments
- Design deployment strategies that account for rollback, validation, and minimal downtime
- Coordinate versioned releases across multiple software components and dependencies
- Support development, staging, and production environment parity
- Logging, Observability & Traceability
- Implement centralized log collection and aggregation across cloud and on-site systems
- Deploy and manage observability tooling (Prometheus, Grafan
Benefits
Additional Information
Staff Engineer, CI/CD & Cloud Infrastructure Location: San Diego, CA Job Type: Full-Time Salary Range: $ 175,000 - $185,000 Position Overview We are looking for a Staff CI/CD & Cloud Infrastructure Engineer to own and evolve our build pipelines, deployment workflows, and cloud infrastructure. You will be responsible for ensuring that software - spanning Python, C/C++, and CUDA on Linux - is built, tested, versioned, and deployed reliably across both AWS cloud environments and a fleet of complex embedded instruments operated in our central lab facility. This is a senior hands-on role for an engineer who thrives at the intersection of DevOps automation, cloud infrastructure management, and release engineering. You will design and maintain CI/CD pipelines, manage complex AWS infrastructure as code, and ensure full traceability from source commits through builds, tests, artifacts, and deployments. You will work cross-functionally with firmware, application, and HPC engineers to keep the entire delivery pipeline fast, reliable, and observable.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Foresite Labs (Stealth Co)? Share your experience