Senior Engineering Manager, Management Plane Systems
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
As we scale our AI infrastructure, we are investing deeply in the software systems that manage, observe, and heal our network at scale. We are hiring a Senior Engineering Manager, SDN Management Plane to lead the team responsible for the automation, observability, configuration management, and policy enforcement layer that runs across our entire network fleet. This is a senior software engineering leadership role. The Management Plane is the horizontal layer that ties together our control and data plane systems, making our network self-aware, self-healing, and continuously verifiable. You will lead a team of senior and staff software engineers while remaining deeply engaged in platform architecture, systems design, and the technical roadmap. This is not a network operations or SRE role. It is a platform engineering leadership position where your primary output is software: automation systems, observability pipelines, configuration management platforms, and the tooling that eliminates manual toil at scale. You will apply sound software engineering principles to hard networking problems, including the application of GenAI and machine learning to network operations. What You'll Be Working On: Platform Architecture & Engineering Own the architecture, development, and production operation of Crusoe's SDN Management Plane, the automation and observability layer that manages our network fleet across all regions. Build and operate CI/CD pipelines for network configuration: automated testing, policy validation, and push-on-green delivery of network changes from intent to production. Design and implement the software systems that enforce reconciliation between declared and actual network state, detect configuration drift, and trigger automated remediation workflows. Define provisioning and onboarding automation for new nodes, regions, and customer environments, ensuring consistent, policy-compliant network configuration at scale. Observability and Intelligent Operations Drive the design of network observability systems including streaming telemetry (gNMI/gRPC), synthetic probing, anomaly detection, and real-time traffic monitoring across GPU clusters. Design and implement self-healing network capabilities: closed-loop automation with appropriate guardrails that detects, diagnoses, and resolves network faults without human intervention. Set the technical vision for applying GenAI and machine learning to network operations, from intelligent anomaly detection to natural-language-driven network management. Cross-Functional Partnership Partner closely with Control Plane and Data Plane teams to ensure clean software interfaces between layers, and with infrastructure and compute teams to support GPU cluster networking requirements. Act as the internal platform owner for network automation: treat other engineering teams as customers with real product requirements, not just consumers of scripts. People Leadership Lead, mentor, and grow a team of senior and staff-level software and network automation engineers. Set technical standards, review architecture and design decisions, own team performance and development. Foster a high-ownership engineering culture focused on shipping production software, not just maintaining tooling. What You'll Bring to the Team: 10+ years of experience in network software engineering, network automation platform engineering, or infrastructure platform engineering. 5 to 7+ years managing senior and staff-level software engineers, with demonstrated ability to build and scale a platform team. Proven track record of architecting and shipping production-grade automation and observability systems, not just configuring or consuming existing tooling. Deep hands-on experience building network automation platforms: ar