Senior Site Reliability Engineer - SRE, Multi Cloud, Exp: 7-12 Yrs
ExternalFull-timeOn-siteToday
AnsibleAWSAzureBGPGCPGrafana
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Build Reliable Infrastructure: Design and operate highly available infrastructure across AWS (EKS), Azure (AKS), and GCP (GKE) regions.
- Observability & Monitoring: Architect comprehensive monitoring, alerting, and dashboards using Prometheus, Thanos, and Grafana.
- AI-Powered SRE: Build MCP servers and AI agents to detect anomalies, automate troubleshooting, and enable self-healing.
- Automation Excellence: Write Python/Go code to automate operations, reduce manual toil, and improve system efficiency.
- Incident Response: Lead incident response, conduct root cause analysis (RCA), and participate in a 24x7 on-call rotation to ensure 24/7 uptime.
Requirements
- 7-12 years of experience in SRE, DevOps, or infrastructure engineering with a focus on reliability at scale.
- Proficiency in programming with Python or Go.
- Expertise in Linux internals (networking, filesystems, memory management) and container orchestration (Kubernetes/EKS/AKS/GKE).
- Experience with Infrastructure as Code ( IaC ) tools such as Terraform , Terragrunt or Ansible and public cloud providers (AWS, Azure, or GCP).
- Willingness to participate in a 24x7 on-call rotation.
- Strong understanding of networking fundamentals (TCP/IP stack, BGP, IPsec, VPN).
- Experience managing multi-cloud operations across AWS, Azure, and GCP simultaneously.
- Knowledge of high-performance networking or VPP (Vector Packet Processing).
- Experience with secrets management (IAM roles, RBAC, service accounts).
- Familiarity with building AI-native SRE tools or MCP server development.
- Why Cisco?
- We are Cisco, and our power starts with you.
Additional Information
Meet the Team Join the founding SRE team building Cisco's next-generation multi-cloud fabric (MC F ) platform from the ground up. We are a high-impact, 17 -person global team (spanning the US , Europe and India) dedicated to ensuring the reliability of a platform designed to handle billions of daily transactions across multiple global regions. You will collaborate closely with Backend, DevOps, and QA teams to pioneer the future of multi-cloud connectivity.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Cisco? Share your experience