Site Reliability Engineer II
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Join Zuora's high-impact Operations team and help power the backbone of our industry-leading SaaS platform. In this role, you'll help ensure the reliability, scalability, and performance of Zuora's global production environment while building the next generation of intelligent operations. We're looking for an engineer who enjoys solving complex infrastructure challenges, embraces an automation-first mindset, and is excited about applying AI and modern cloud technologies to improve operational excellence. You'll have the opportunity to: Design and implement intelligent automation for infrastructure lifecycle management, including self-healing, anomaly detection, and automated remediation using Infrastructure as Code (IaC) and AI-driven tooling. Apply AI/ML techniques for predictive monitoring and proactive performance optimization to identify issues before they impact customers. Lead complex incident response efforts and root cause analyses, embedding automation and continuous learning into operational processes. Improve system reliability through dynamic scaling, telemetry instrumentation, and automated performance tuning. Enhance operational runbooks and playbooks by eliminating manual processes through automation. Evaluate and adopt emerging AIOps, cloud-native, and distributed systems technologies to continuously improve our platform. Partner cross-functionally with Product Engineering, Customer Support, Global Services, Deal Desk, and Sales to deliver exceptional customer experiences. Our technology stack includes Linux, Python, Docker, Kubernetes, AWS, Kafka, ActiveMQ, MySQL, Oracle, Redis, Tomcat, Jenkins, Terraform, GitOps, Ansible, Puppet, Prometheus, Grafana, OpenTelemetry, Debezium, Web Application Firewalls, and Load Balancers. About You You're passionate about building reliable systems, automating repetitive work, and continuously improving how infrastructure operates. You enjoy troubleshooting complex technical problems, collaborating across teams, and learning new technologies. We're looking for someone with: 2-4 years of experience in Linux systems administration and/or Python development in production environments. Strong Linux administration skills, including troubleshooting, service management, performance tuning, and networking fundamentals. Experience developing Python scripts or lightweight applications to automate operational workflows and system management. Hands-on experience with Docker and familiarity with Kubernetes concepts, including deployments, services, and scaling. At least one year of experience supporting SaaS or cloud-native production environments. Working knowledge of messaging platforms and databases such as Kafka, Redis, MySQL, or similar technologies. Experience contributing to CI/CD pipelines and deployment automation. Hands-on experience with monitoring and observability platforms such as Prometheus, Grafana, or similar tools. Experience participating in incident response, post-incident reviews, and root cause analysis. A demonstrated passion for automation and improving operational efficiency. Zuora's Operations team is responsible for keeping our global SaaS platform running reliably, securely, and at scale. We combine operational excellence with engineering best practices to build resilient systems that enable our customers to succeed. Our team believes that the best operations are automated, observable, and continuously improving. We invest heavily in modern cloud infrastructure, AI-driven operations, and engineering innovation to reduce manual work, improve reliability, and empower our engineers to solve meaningful technical challenges. If you're excited about building intelligent infrastructure, driving automation, and shaping the future of cloud operati
Requirements
- Experience with Jenkins, Terraform, GitOps, or advanced Infrastructure as Code practices.
- Exposure to AI/ML technologies for anomaly detection, predictive operations, or intelligent automation.
- Relevant certifications such as RHCSA, AWS/Azure/GCP certifications, PCAP (Python), Docker Certified Associate (DCA), Certified Kubernetes Administrator (CKA), or SRE-related certifications.
Benefits
Additional Information
Costa Rica About Zuora At Zuora, we help businesses grow smarter and adapt faster. Our platform powers modern business models - from subscriptions and usage-based pricing to AI-driven and outcome-based offerings - helping companies launch new products, automate complex billing, and unlock predictable, recurring revenue. We've led the Subscription Economy for more than a decade. Now we're evolving again by building the definitive platform for quote to cash and helping companies monetize their products and services with an adaptable, AI-ready foundation. This is a location-specific position that requires you to come into the office regularly to be most effective. * Zuora Costa Rica office (Heredia): hybrid model with 3 days in office and 2 days remote.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Zuora? Share your experience