Site Reliability Engineer II

External

Todyl · Denver, CO

Full-timeRemote1mo ago

AWSCI/CDComplianceDocumentationKubernetesLess

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

At Todyl, we are on a mission to protect small and medium-sized businesses from ever-changing cyber threats. The Todyl platform fully integrates threat, risk, and compliance management to provide exceptional, affordable, unified cybersecurity solutions to MSPs (Managed Service Providers) and their end customers. At the end of the day, we're here to keep our partners and customers safe and help them manage the risks and comply with regulations. Protecting others requires a team that works together with trust and cares deeply about carrying out our mission. The Site Reliability Engineering team at Todyl exists to make our platform reliable, secure, and easy for engineering teams to ship to. We do that by building automation, self-service tooling, and operational standards that let developers move fast without putting customers at risk. Our success is measured by how much production reliability and developer velocity we enable, not by how much work flows through us. You'll spend your time building the tooling and platform capabilities that let engineering teams deploy, scale, and configure their services without having to file a ticket with us. You'll partner closely with developers, take operational reliability seriously, and bring an automation-first mindset to a platform that handles security workloads at the heart of our product. In this role, we're looking for someone who: Has a bias for action and a strong sense of ownership. They finish what they start and stay with the work through stabilization, not just through a successful deploy. Sees SRE as a service to the engineering organization, not a gate. They build trust with developers and make other teams' jobs easier. Treats security as a normal part of platform operations, not an afterthought, and brings a growth mindset to security regardless of starting expertise. Gets energized by eliminating toil. They look at repetitive work and ask, "How do we make this go away?" Actively uses AI tooling in their day-to-day work and is curious about where it goes next. Can communicate technical decisions clearly to engineers and non-engineers, and is comfortable saying no or pushing back constructively when it matters.

Responsibilities

You'll build and operate the production platform, including Kubernetes, CI/CD pipelines, infrastructure-as-code, observability, secrets management, and the AWS foundation on which our services run.
You'll automate the path to production, investing in self-service capability so engineering teams can deploy and scale without depending on you for routine work. We're shifting from reactive to proactive, and we'd rather build guardrails than approve every deploy.
You'll drive cost visibility and efficiency across our cloud footprint, including AWS resource tagging, COGs attribution, and right-sizing across the platform.
You'll modernize how we run on-call: living runbooks, alerting we trust, and post-incident reviews as a normal part of how the team operates.
You'll embed security into day-to-day operations through patching, access controls, secrets rotation, and dependency hygiene, as part of the platform you operate rather than a separate workstream.
You'll partner with product teams early on reliability for high-stakes projects, helping shape the design rather than reviewing it the week before launch.
You'll participate in a weekly on-call rotation, resolve most issues independently, and update documentation after incidents.
You'll plan and estimate honestly. Break work into smaller increments, communicate delays early, and write tests for the automation you build because it runs in production.
You'll treat code review as a quality lever, not a checkbox. Catch missing tests, push back on tech debt, and watch dashboards and logs to verify your own changes after they ship.
You'll mentor less-tenured teammates through pairing, documentation, and the example you set. The team has engineers at different stages, and we expect knowledge to flow across them.
When something you've built is mature and stable, you'll look for ways to hand it off or make it self-managing rather than holding onto it forever.
We don't expect deep knowledge across every item below, but familiarity with several of these will help you ramp quickly. Most importantly, we're looking for a strong technical background and the willingness to learn what you don't already know.
Kubern

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at todyl? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect