Skip to main content
Back to jobs

Senior Site Reliability Engineer

External
paralleldomain logoParalleldomain · Remote
Full-timeRemote2mo ago
AWSKubernetesREST
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

Before an autonomous vehicle navigates a busy intersection, before a robot learns to pick and place in a warehouse, before any Physical AI system is trusted in the real world, it has to prove itself in ours. Parallel Domain builds the platform that validates the next generation of autonomous systems in high-fidelity virtual environments, and the infrastructure underneath that platform is what makes simulation at scale possible. We're hiring a Senior Site Reliability Engineer to help build and operate that infrastructure. This role sits at the core of how we run large-scale, distributed simulation workloads for autonomous-systems testing and validation. You'll work across multi-region AWS infrastructure, operate Kubernetes at scale, and contribute directly to reliability, security, and deployment systems that the rest of the engineering org depends on. This is a hands-on role with the broad ownership typical of a startup. You'll partner closely with platform, simulation, and ML teams to keep the system running smoothly and evolving. We're growing the team-two of these roles are open-and the work is substantive: multi-region GPU scheduling, Windows workloads on Kubernetes, large-scale batch simulation, and an enterprise product direction that will require rethinking parts of how we deploy and operate.

Requirements

  • Experience. 5+ years in SRE, DevOps, or infrastructure engineering roles, with a track record of operating production systems across multiple regions.
  • Terraform. Modules, state management, and multi-environment patterns.
  • AWS depth. Solid experience across VPC, IAM, EKS, S3, and CloudWatch.
  • Kubernetes expertise. Cluster operations, autoscaling, RBAC, and Helm.
  • CI/CD and GitOps. Experience with GitHub Actions, ArgoCD, or similar workflows.
  • Networking fundamentals. CIDR, DNS, load balancing, VPN, and cross-region connectivity.
  • Observability. Experience with tooling such as Prometheus and Grafana.
  • Scripting. Comfort with Python and Bash for tooling and automation.
  • Cross-platform familiarity. Working knowledge of both Linux and Windows environments. Operational experience supporting Windows-based workloads is a meaningful advantage.
  • Pragmatism and ownership. Comfortable in a fast-moving startup with evolving priorities. You take ownership of systems while collaborating closely with other teams, and you're pragmatic about tradeoffs between speed, reliability, and complexity.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at paralleldomain? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect