Architect and evolve core platform capabilities for reliability, including execution environments, CI/CD systems, and validation pipelines that support high-throughput, machine-assisted change.
Design and implement fast, ephemeral, and strictly isolated execution environments where generated work can be built, tested, and safely discarded at scale.
Transform CI/CD into a validation system by embedding automated verification (tests, integration harnesses, canarying, rollback signals) into promotion decisions.
Build production-like validation environments that allow realistic system behavior testing without impacting live systems.
Establish deep observability patterns for autonomous workflows, including tracing what ran, what failed, why, and what it cost across agents, tools, and orchestration layers.
Define and implement guardrails-as-code, including access controls, policy enforcement, cost protections, and auditability for platform usage.
Design for reliability from day one, including scalability, fault tolerance, performance optimization, and operational resilience.
Lead technical design reviews and influence platform and infrastructure decisions across engineering teams.
Define and document reusable infrastructure patterns, platform standards, and reference implementations that create a consistent paved path for teams.
What This Is Not
Not a ticket queue or generic support role.
Not incremental-only ops without ownership of architecture and adoption.
Not "just Kubernetes admin"-Kubernetes is one layer in a broader platform problem.
Requirements
Typically 10+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or Platform Engineering.
Deep experience designing and operating distributed systems at scale, including cloud platforms (e.g., AWS), Kubernetes, and infrastructure-as-code.
Strong expertise in reliability engineering practices, including incident management, fault isolation, resiliency design, and system performance tuning.
Experience building and operating CI/CD systems, test harnesses, and automated validation frameworks.
Strong understanding of observability systems, including metrics, logging, tracing, and system-level debugging.
Demonstrated ability to define technical standards and influence multiple teams through architecture, design review, and strong engineering judgment.
Strong production mindset, with experience designing systems for scalability, availability, and operational efficiency.
Experience implementing secure, multi-tenant infrastructure with strong isolation, IAM, and secrets management practices.
Excellent cross-functional collaboration skills.
Growth mindset and One Team orientation.
AND IT'S GREAT TO HAVE:
Experience supporting AI/LLM-powered systems in production, including understanding of latency, cost, and orchestration challenges.
Experience designing high-throughput ephemeral compute systems or sandboxed execution environments.
Experience building internal developer platforms or platform-as-a-product capabilities.
Familiarity with governance or regulated environments.
Experience with advanced validation systems such as canarying, chaos engineering, or automated rollback strategies.
What Success Looks Like
Faster delivery through platform-enabled validation and automation.
Automated validation of changes before production, reducing reliance on manual review.
Platform standards adopted across teams as the default paved path.
Early detection of reliability issues through strong observability and validation systems.
Reduced infrastructure complexity so engineers can focus on product and policy.
Benefits
Purpose led company with a Values focused culture - Best Life, One Team, Growth MindsetTime Off - with our open PTO policy, you pick when you want time off when it's best for you!Days of Disconnect - once a quarter, we take a collective breather and enjoy a day off together around the globe. #oneteamParental & Pawternity LeavePaid time offParental leave
Additional Information
- Join ABC Fitness, the leading technology provider for the fitness industry!
Join ABC Fitness and become part of a culture that's as ambitious as it is authentic. Let's transform the future of fitness-together!
Our Values
Best Life
We believe great work begins with great people. That's why our culture is built on respect, trust, and belonging. We create an inclusive environment where every team member can bring their authentic self to work-because diverse perspectives drive innovation and meaningful impact.
Growth Mindset
We are doers, thinkers, and dreamers. At ABC Fitness, your growth is our investment. Through continuous learning, mentorship, and professional development opportunities, we empower you to reach new heights-personally and professionally.
One Team
From day one, you'll be part of a team that collaborates, celebrates, and cares. We move fast, support one another, and have fun along the way. Because when you thrive, we all thrive.