Software Engineer - Site Reliability Engineering
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Automate for insight and scale: Build systems that make troubleshooting fast, safe, and scalable across thousands of Neo4j instances. From internal tools that surface clear insights to canaries that support safe rollouts, you'll focus on automation that elevates reliability engineering. Treat operations as a software problem: Replace tribal knowledge and ad-hoc scripts with tools and systems that codify best practices-making operations predictable, scalable, and repeatable. Design for resilience, learn from failure: Own and evolve the tooling and processes behind incident response. From clear alerts to blameless reviews, you'll help ensure teams respond with confidence and learn with clarity. Champion reliability as a product feature: Help teams define and act on SLIs and SLOs, turning reliability into a shared, data-driven priority across engineering. Create signals, not noise: Shape an observability stack that tells us what matters, when it matters-so we can detect issues early and resolve them quickly. We're interested in hearing from Engineers with deep experience in some of the following areas Writing backend tools and automation in Go-our primary language-with an emphasis on sound architecture, testing, and maintainability. Strong software skills in other languages, like Python, are also welcome. Applying SRE practices in real-world environments: defining SLIs and SLOs, reducing toil through automation, and driving reliability through engineering. Collaborating with other teams to promote SRE thinking-educating on principles like observability, ownership, and service level objectives. Troubleshooting large-scale, cloud-based systems with confidence and curiosity. Monitoring distributed systems and understanding their performance characteristics. Designing systems with reliability, safety, and debugability as first-class concerns. Working with observability tools like OTel Collector, Prometheus, Grafana, and Google Cloud's operations suite. Deploying and managing applications on Kubernetes; cluster-level administration is a plus. Managing infrastructure with Kustomize and Terraform-keeping it clear, modular, and easy to evolve. Building and maintaining CI/CD workflows-ours run on GitHub Actions. Participating in on-call rotations and incident response with a focus on improvement, not blame. Writing and contributing to postmortems that lead to meaningful, lasting changes. #Li-Hybrid Why Join Neo4j? Neo4j is, without question, the most popular graph intelligence platform in the world. We have customers in every industry globally, and our products are a proven product/market fit. Joining our team is an opportunity to shape the future of data and analytics. Below are just a few exciting facts about Neo4j. Neo4j is one of the fastest-scaling technology companies in this industry. It recently surpassed $200M in annual recurring revenue (ARR), doubling its ARR over the past three years. Raised the biggest funding round in database history ($325M Series F). Backed by world-class investors l
Benefits
Additional Information
About Neo4j: Neo4j is the graph intelligence platform that transforms data into knowledge to power the next generation of intelligent applications and AI systems. It includes enterprise-ready knowledge graphs for accurate, explainable, and governed AI; the most comprehensive, trusted, and easy-to-deploy graph capabilities across any environment and data source; and an unmatched ecosystem trusted by 84 of the Fortune 100 and supported by the world's largest graph community. Intelligence that works. Results that matter. Built to work everywhere and integrate with everything across every cloud for dynamic, personalized, and autonomous AI systems. We deliver quicker results, contextual knowledge, and solutions that impact customers and employees across the business. Our Vision: At Neo4j, we have always strived to help the world make sense of data. As business, society and knowledge become increasingly connected, our technology promotes innovation by helping organizations to find and understand data relationships. We created, drive and lead the graph database category, and we're disrupting how organizations leverage their data to innovate and stay competitive. The Team The Site Reliability Engineering team's mission is to improve the reliability of Neo4j's DBaaS product: Neo4j Aura. Operating at a global scale across all three major cloud providers, Aura runs hundreds of Kubernetes clusters and hosts thousands of Neo4j instances in production at any given time. We're reshaping what SRE means at Neo4j Aura-and we want you to be part of that journey. Rather than firefighting or chasing alerts, we're helping teams design for reliability from day one. That means building the tools, practices, and culture that embed SRE principles at the heart of how Aura operates. You'll be joining a team focused on long-term resilience, engineering excellence, and meaningful collaboration with product teams.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at neo4j? Share your experience