Senior Engineering Manager, Managed Platform Services
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Join Crusoe as a Senior Engineering Manager and lead a talented team focused on revolutionizing our cloud infrastructure. In this pivotal role, you'll lead the Command Center Insights & Actions team - building the systems that translate raw infrastructure telemetry into human-readable diagnostics and automated remediation workflows. You'll own a technical roadmap spanning alerting engines, heuristic development, node health systems, and state machines that trigger proactive maintenance without impacting customer workloads, while exploring the integration of Large Language Models (LLMs) to build cutting-edge AI solutions within our Command Center product. This is a full-time opportunity for a passionate leader who thrives on building high-performing teams, fostering innovation, and delivering impactful, data-driven solutions in a dynamic environment. What You'll Be Working On: Drive the Insights & Actions Roadmap: Own and execute across alerting infrastructure, control plane APIs, automated action systems, and telemetry-derived insights such as straggler node detection and GPU profiling. Influence Strategic Roadmaps: Contribute significantly to the team's roadmap, impacting long-term team goals and operational performance metrics. Refine Early Product Requirements: Collaborate with product and engineering leadership to bring clarity to ambiguous problems early in the scoping process. Collaborate Cross-Functionally: Partner with product, design, and engineering teams inside and outside the organization to align on goals and deliver integrated solutions. Manage Complex Projects: Lead critical initiatives involving multiple engineers, including those outside your direct report structure, ensuring customer outcomes are auditable and decisions are data-driven. Drive Technical Excellence: Champion process improvements, operational excellence, and best practices across the team. Cultivate Team Growth: Coach and mentor engineers from new grad to Staff level, setting clear performance expectations and defining career paths to build a high-performing, sustainable team. What You'll Bring to the Team: Technical Expertise in Observability & Intelligence Systems: Hands-on background in ML, heuristics, or rule-based systems - with the ability to engage deeply on problems like anomaly detection, threshold design, and automated remediation logic. Proven Leadership: Demonstrated track record of people management, leading with empathy, and maintaining a sustainable workload for your teams. Technical Acumen: Ability to lead effectively in spaces where problems, opportunities, and strategies are not yet fully defined - driving clarity, direction, and execution. Cross-Functional Collaboration: Excellent technical communication skills, both verbal and written, to work effectively across diverse roles and functions. Project Ownership: Proven experience owning and delivering complex projects end-to-end, with measurable quality and data-driven decision-making. Global Scale Experience: Background building and operating global services at scale. Organizational Prowess: Highly organized and capable of managing multiple complex initiatives and team priorities in parallel. Bonus Points Background in data platforms and data science Background in observability platforms or products Familiarity with GPU profiling tools (Nsight, NCCL Inspector) or infrastructure diagnostics at the hardware layer Highly motivated and proactive in identifying process improvements and boosting team efficiency Passion for coaching and mentoring engineers into high-performing individuals Enthusiasm for building team culture with a high quality of life for engineers A true "people-person" who thrives in collaborative environments and is energized by teamwork