Customer Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
At Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies. Cloudflare protects and accelerates any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare all have web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks. Cloudflare was named to Entrepreneur Magazine's Top Company Cultures list and ranked among the World's Most Innovative Companies by Fast Company. At Cloudflare, we're not looking for people who wait for a polished roadmap; we're looking for the builders who see the cracks in the Internet that everyone else has simply learned to live with. We value candidates who have the instinct to spot a "normalized" problem and the AI-native curiosity to create a solution using the latest tools. Our culture is built on iteration, leveraging AI to ship faster today to make it better tomorrow, while ensuring that every improvement, no matter how small, is shared across the team to lift everyone up. If you're the type of person who values curiosity over bureaucracy, and that AI is a partner in solving tough problems to keep the Internet moving forward, you'll fit right in. Available Locations: Austin, Texas Cloudflare built its reputation helping build a better Internet, defending millions of sites, giving away SSL and DDoS mitigation when the industry charged premium prices. In an acceleratingly dangerous world, the scope of that mission has changed. We are becoming something more: critical infrastructure. Banks run their payment rails on us. Governments run public services on us. Media companies depend on us during live events. Health systems depend on us to provide care. Reliability for these customers is no longer a feature of our product. It is a mission. Serving that customer base demands a different operating model. Traditional support organizations route tickets. Traditional engineering organizations ship features. Neither alone is enough when the stakes are this high. We are pivoting to something different: a customer-facing engineering organization, directly engaged with our customers at scale. This is work a central dev team cannot do from the inside of the network. The Customer Reliability Engineering function is the spine of that pivot. CRE is SRE applied outward, the same engineering discipline, applied to the reliability of the systems our customers run on Cloudflare. You are the engineer who owns the problems that matter most to the customers who matter most, and you contribute directly to our products and tooling, in partnership with Product Engineering, to hold that standard across the entire customer base. CRE is a rapid response team and a proactive engineering team. You fix things at the edge as they come up, and you help build the product capabilities that identify customer issues before they become a crisis. Both modes are equally core. Rapid response. When a customer issue surfaces that is high-severity, cross-layer and complex, you are the engineer who answers. You reproduce the defect, isolate the root cause across Cloudflare's infrastructure and the customer's stack, drive the fix with Product Engineering, and confirm resolution. You hold on-call for high-severity incidents as part of a global shift rotation. Proactive engineering. When no fire is burning, you work with Product Engineering and our platform teams to build the capabilities that make the next fire cheaper or unnecessary: telemetry pipelines that correlate signals across the customer base, detectors that fire before a human notices, diagnostic tooling that scales across hundreds of customers, automation that reduces toil for Customer Support. Every incident you carry generates engineering output that reduces the cost of recurrence. The work compounds. Cloudflare is building CRE as an AI-native function. You will work with and help build agents and tooling that pre-diagnose incidents, surface relevant logs and configuration, and propose fixes with cited evidence. Engineers who ship AI-assisted diagnostics are the ones defining this discipline. What You Might Work On Rapid response: Own a Sev-1 incident where a large financial services customer sees asymmetric latency from a single POP. Trace it through BGP routing and origin configuration. Produce the fix upstream. Diagnose a recurring WebSocket disconnect that a media customer has been fighting for weeks. Isolate it to a specific interaction between WAF and their origin load balancer. Drive the fix with Product Engineering. Partner with a government customer's SRE team during an acti
Benefits
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Cloudflare? Share your experience