Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
At coreflow, we're redefining entertainment with AI. We're one of the fastest-growing startups globally, serving 20 million users in our first year . We're an in-person company based in Sydney, Australia. We're guided by a set of principles that are foundational to our work and drive every decision: User-first: We build things that people want. We invest time to understand our users and focus on adding value instead of extracting value. High agency, high ownership: We're responsible for the pieces we own, end-to-end. We own every mistake, figure out what went wrong, and fix it. We don't blame anyone or anything else. Urgency: This is a once-in-a-lifetime opportunity. We prioritize well, find ways to increase leverage, and move at an inspirational pace.
Responsibilities
- You'll be our first dedicated site reliability engineer, owning reliability and core platform decisions as we scale to hundreds of millions of users.
- Example projects
- Improve uptime and reduce RTO across critical services.
- Orchestrate and harden GPU clusters serving millions of AI generations per day.
- Implement platform-wide observability (metrics, tracing, alerting) and enforce SLOs.
- Optimize AWS infrastructure and reduce cloud spend without sacrificing performance.
Requirements
- 5+ years operating production systems at scale.
- Strong AWS experience (infra-as-code, high-scale compute, K8s/ECS or similar).
- Deep observability and incident response experience.
- CI/CD and deployment pipeline expertise.
- Familiarity with our tech stack: TypeScript, Next.js, React, TailwindCSS, tRPC, Postgres, Temporal, AWS.
- You write code and fix root causes, not just symptoms.
- Hunger to win. This is not going to be easy.
Benefits
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at coreflow? Share your experience