Staff Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
SimSpace serves as an AI Proving Ground where organizations can confidently train, test, and outmaneuver adversaries in any environment. Trusted by allied governments, militaries, enterprises, and research institutions worldwide, SimSpace enables adaptive, AI-ready defenses that stay ahead of evolving threats. Founded in 2015 by experts from U.S. Cyber Command and MIT Lincoln Laboratory, the platform unifies training, testing, and validation in a realistic, live-fire simulation-helping teams evaluate security investments, optimize performance, and compress cyber readiness cycles from months to days. Why join SimSpace? We are an organization that is focused on building our culture and mindfully enhancing our atmosphere every day which is why we have collaborated on an integral value system. Our governing philosophy of being Human Centered is deeply embedded within our value system. We apply this philosophy to every one of our internal team members, external clients, and their customers. How Do We Work? We believe that people are at the center of everything we do. SimSpace fosters a culture of continuous learning, curiosity, and professional growth. That belief shows up in action: in-house training, internal and external learning platforms, cyber conferences, industry events, and dedicated time for skill development. Our people are empowered to shape their careers - and it shows. Year over year, SimSpace consistently outperforms industry benchmarks in internal mobility, promotions, and total rewards growth. Who Thrives Here? We are a team of innovators, protectors, and problem-solvers. We believe diversity of thought and experience fuels better solutions, and we're committed to building teams that reflect the communities we serve. Whether you're remote or office-based, you'll collaborate with talented colleagues across departments and time zones, united by the mission to create a safer digital world. We invite you to apply today! About the Role We are looking for a Staff Site Reliability Engineer to define the technical vision, lead the architecture, and secure the infrastructure that powers the SimSpace cyber range platform. The ideal candidate is a deeply experienced SRE and exceptional software engineer who thinks strategically about distributed systems, reliability, and operability at a global scale. At the Staff level, you will act as a force multiplier-architecting resilient systems, driving engineering standards, and solving our most complex infrastructure challenges rather than relying on manual processes or localized fixes. In this position, you'll provide overarching technical leadership across our SRE practice, bridging traditional site reliability, DevOps, and DevSecOps. You'll architect the systems and strategies that allow SimSpace to deliver software seamlessly across our own data centers, to customers who bring their own hardware, and as pre-packaged appliances with bundled hardware and software. As our on-premises product matures and scales, you will design the long-term automation frameworks that make these varied deployments robust, secure, and repeatable. What will you be doing as a Staff SRE at SimSpace? Technical Strategy & Architecture: Design and architect the overarching infrastructure strategy that enables consistent, repeatable, and secure deployments across SimSpace-hosted data centers, customer-provided hardware, and highly restricted air-gapped environments. Platform Evolution & Configuration Management: Lead the evolution of our CI/CD and Kubernetes platforms. Drive advanced application packaging, templating, and configuration management strategies using Jsonnet and Grafana Tanka (alongside Kustomize). Move beyond maintaining pipelines to architecting multi-cluster, multi-environment deployment frameworks that drastically improve developer velocity. Reliability Leadership: Define, measure, and govern Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets across the engineering organization. Partner with product and engineering leadership to balance feature delivery with platform stability. Advanced Observability: Architect our enterprise observability strategy using the Grafana stack. Design frameworks for proactive monitoring, complex anomaly detection, and distributed tracing that give teams unparalleled visibility into system health, pod scaling, and latency bottlenecks. Security & Compliance Architecture: Drive the infrastructure security posture at an architectural level. Embed advanced container security, zero-trust network segmentation, and automated compliance policies directly into our deployment pipelines and runtime environments. Cross-Functional Enablement: Serve as a strategic partner and consultant to development teams. Advocate for an "SRE culture" by designing self-service tooling, establishing "paved roads" for developers, and reducing operational toil across the entire engineering org. Incident Command: Act as an Incident
How well this role fits your profile.
Worked at simspace-corporation? Share your experience