Skip to main content
Back to jobs

Site Reliability Engineering Technical Leader (Data Center Network Services)

External
Cisco logoCisco · Bangalore, India
Full-timeOn-site6d ago
AgileAnsibleBGPCI/CDDocumentationGit
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Minimum Requirements
  • Bachelor of Engineering or Technology with a minimum of 10 years of experience and demonstrated ability in designing and building scalable and reliable networking solutions specifically for AI/ML infrastructure and high-performance computing environments.
  • Strong work experience with Cisco Data Center Networking technologies
  • Strong programming skills and concepts to deliver networking technologies
  • Expertise with Continuous Integration and Development (CI/CD), and setting up CI/CD pipelines
  • Proficiency in Terraform and Ansible for Infrastructure as Code (IaC)
  • Experience in tools including JIRA, GIT, and Jenkins
  • Solid grasp of software engineering concepts including common data structures/standard algorithms, object-oriented design, distributed computing and cloud computing paradigms.
  • Expertise in AI Fabric with a deep understanding of high-performance networking for AI/ML workloads.
  • Managing networking for GPU Experience clusters environments.
  • Ability to implement and utilize AI-based observability tools.
  • Ability to forecast infrastructure needs for scaling AI workloads and managing the lifecycle of hardware/software releases.
  • Experience in technologies like Routing, Switching, Nexus, VPC, VDC, VLAN, VXLAN, BGP
  • Experience with ACI networks.
  • Experience in creating documentation and training materials
  • Ability to work closely with Business Units to resolve hardware/software interoperability issues.

Requirements

  • Good understanding of the Build & Release Operations
  • Good understanding on DevOps principles
  • Comfortable with Agile practices and beliefs in "quality driven" development
  • Understanding of Unix/Linux
  • Domain knowledge about contemporary network technologies, network management and protocols
  • Experience on application/platform instrumentation, measurement, log data processing, and monitoring
  • CCNA or CCNP
  • Experience in managing Cisco Nexus Dashboard and APIC for centralized policy, monitoring, and fabric orchestration.
  • Experience with Nexus Dashboard Fabric Controller
  • Experience with VXLan based networks and troubleshooting
  • Why Cisco?
  • We are Cisco, and our power starts with you.

Additional Information

Senior AI Site Reliability Engineer Meet the Team We are the Data Center Network Services team within Cisco IT that supports network services for Cisco Engineering and business functions worldwide. Our mission is simple - build the network of the future that is adaptable and agile on Cisco's networking solutions. Cisco IT networks are deployed, monitored, and managed with a DevOps approach to support rapid application changes. We invest in transformative technologies that enable us to deliver services in a fast and reliable manner. The team culture is collaborative and fun, where thinking creatively and tinkering on new ideas are encouraged.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Cisco? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect