Industrial AI Cloud - Network Engineer (REF5506H)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Coordinate Operations together with Data Center, IaaS & PaaS layer: Coordinate and support network lifecycle activities (installs, upgrades, changes, firmware updates) and manage /network interconnections and related documentation
- Switch & Firewall Management: Provision and maintain InfiniBand switches according to ITIL Standards
- Automation: Develop and maintain automation scripts to orchestrate overall scope. Fine tuning, configuration changes through whole project lifetime
- OS & Firmware Management: Maintain network-based environments, apply patches, and manage firmware upgrades at scale.
- Monitoring & Observability:
- ITIL Processes: Follow and improve incident, problem, and change management workflows; document runbooks and standard operating procedures. Adhere to ZERO Outage guidelines.
- Cross-Team Collaboration: Work closely with Platform Engineers and AI solution teams to ensure smooth deployments and operations.
- Manage High-Speed Fabric : A unified network fabric utilizing both InfiniBand and Ethernet / RoCE technologies.
- Management Network : A separate 1 Gbps Ethernet and serial console for out-of-band (OOB) network management.
- PE/CE datacenter connectivity : CE routers, firewalls
Benefits
Additional Information
NVIDIA and Deutsche Telekom are jointly developing the world's first industrial AI cloud for European manufacturers. This AI factory in Germany will host 10,000 GPUs across NVIDIA DGX B200 systems and RTX Pro Servers. Deutsche Telekom provides secure, sovereign and fast infrastructure, including data centers, operations, security, and AI solutions. Role Overview We are seeking an Network Engineer to new networking team for automation and operation related network components such as Switches, Firewalls, Routers, Border Gateways as part of core environment of the Industrial AI Cloud. In this role you will provision and manage above mentioned stack, implement and fine-tune monitoring, and deploy additional components if necessary. You'll be working and coordinating between multiple teams (such as Infrastructure, Platform) to deliver and continuously improve infrastructure services following ITIL processes. Detailed scope of the operations: 98x MQM9700-NS2F (InfiniBand 400G) 138x SN2201 (1G Spectrum based Ethernet switch, Cumulus OS) 8x SN5400 (100G Spectrum-3 based Ethernet switch, Cumulus OS) 101x SN5610 (800G Spectrum-4 based Ethernet switch, Cumulus OS) 4x FortiGate FG-201G 2x FortiGate 4801F-EU 2x Border Gateways - Cisco CR-8608 or Juniper PTX 10004 2x NVIDIA UFM appliance Proprietary technologies used for managing above scope: InfiniBand, Cumullus OS, RoCE, UFM, FortiGate friewalls, Cisco Border gateways.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Deutschetelekomitsolutions? Share your experience