Principal Platform Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
As a Principal Platform Engineer at Gradial, you will shape the foundation our platform runs on as we scale. You will work closely with the CTO and engineering team to make our systems faster, more resilient, and easier to operate in a high-growth environment. This is a hands-on individual contributor leadership role for someone who wants real ownership, high leverage, and the opportunity to define how platform reliability looks at an AI-native company.
Responsibilities
- Own the reliability, scalability, and operational health of Gradial's production platform.
- Lead the evolution of Kubernetes, CI/CD, observability, and infrastructure as code across the stack.
- Set the standard for how we design, ship, and operate reliable systems.
- Build the tooling and automation that help engineers move faster with more confidence.
- Drive improvements in monitoring, alerting, incident response, and service readiness.
- Partner with engineering to identify scaling risks early and solve them before they slow us down.
- Influence the long-term direction of our platform across reliability, security, performance, and cost.
Requirements
- 5+ years of experience in platform engineering, infrastructure, SRE, DevOps, or related roles with direct ownership of production systems.
- Proven success designing and operating production-grade infrastructure in fast-moving, high-growth environments.
- Deep expertise in Kubernetes, cloud-native architecture, and container orchestration.
- Strong experience with infrastructure as code, GitOps, CI/CD workflows, and modern deployment practices.
- Strong command of observability and reliability fundamentals across metrics, logging, tracing, alerting, and incident response.
- A track record of leading through influence, making sound technical decisions, and raising the bar across engineering teams.
- Familiarity with AI or ML infrastructure, including GPU provisioning, model deployment, or compute-intensive workloads.
- Experience supporting cloud or multi-cloud environments with a focus on resilience and scale.
- Comfort with TypeScript or Python for internal tooling and operational automation.
- You'll thrive here if you...
- Embrace AI as a core tool for problem-solving, creativity and scale.
- Show a strong work ethic, high ownership and bias toward action.
- Communicate with clarity and curiosity.
- Thrive in fast-paced, hyper-growth environments; where building is always better than maintaining the status quo.
Benefits
Additional Information
Gradial helps marketers and creatives move from idea to execution faster. Our platform turns intent into action, automating website updates, design system migrations, and ongoing content optimization while preserving brand integrity across every touchpoint. Backed by leading investors, we're building software that adapts to the user, not the other way around. We move with urgency, operate with ownership, and solve hard problems from first principles. If you want to do ambitious work, take real responsibility, and help define the future of AI-native content operations, you'll do your best work here.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at gradial? Share your experience