Site Reliability Engineering Technical Leader (Data Center Network Services)

External

Cisco · Bangalore, India

Full-timeOn-site6d ago

AgileAnsibleBGPCI/CDDocumentationGit

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Minimum Requirements
Bachelor of Engineering or Technology with a minimum of 10 years of experience and demonstrated ability in designing and building scalable and reliable networking solutions specifically for AI/ML infrastructure and high-performance computing environments.
Strong work experience with Cisco Data Center Networking technologies
Strong programming skills and concepts to deliver networking technologies
Expertise with Continuous Integration and Development (CI/CD), and setting up CI/CD pipelines
Proficiency in Terraform and Ansible for Infrastructure as Code (IaC)
Experience in tools including JIRA, GIT, and Jenkins
Solid grasp of software engineering concepts including common data structures/standard algorithms, object-oriented design, distributed computing and cloud computing paradigms.
Expertise in AI Fabric with a deep understanding of high-performance networking for AI/ML workloads.
Managing networking for GPU Experience clusters environments.
Ability to implement and utilize AI-based observability tools.
Ability to forecast infrastructure needs for scaling AI workloads and managing the lifecycle of hardware/software releases.
Experience in technologies like Routing, Switching, Nexus, VPC, VDC, VLAN, VXLAN, BGP
Experience with ACI networks.
Experience in creating documentation and training materials
Ability to work closely with Business Units to resolve hardware/software interoperability issues.

Requirements

Good understanding of the Build & Release Operations
Good understanding on DevOps principles
Comfortable with Agile practices and beliefs in "quality driven" development
Understanding of Unix/Linux
Domain knowledge about contemporary network technologies, network management and protocols
Experience on application/platform instrumentation, measurement, log data processing, and monitoring
CCNA or CCNP
Experience in managing Cisco Nexus Dashboard and APIC for centralized policy, monitoring, and fabric orchestration.
Experience with Nexus Dashboard Fabric Controller
Experience with VXLan based networks and troubleshooting
Why Cisco?
We are Cisco, and our power starts with you.

Additional Information

Senior AI Site Reliability Engineer Meet the Team We are the Data Center Network Services team within Cisco IT that supports network services for Cisco Engineering and business functions worldwide. Our mission is simple - build the network of the future that is adaptable and agile on Cisco's networking solutions. Cisco IT networks are deployed, monitored, and managed with a DevOps approach to support rapid application changes. We invest in transformative technologies that enable us to deliver services in a fast and reliable manner. The team culture is collaborative and fun, where thinking creatively and tinkering on new ideas are encouraged.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Cisco? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect