Skip to main content
Back to jobs

Staff Software Engineer, Systems Engineering Focus

External
Crusoe logoCrusoe · San Francisco, CA
Full-timeOn-site2mo ago
Cross-functional CollaborationDesign ThinkingHelmKubernetesLeadershipLinux
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

We are seeking a Staff Software Engineer to design, build, and scale Crusoe Cloud's customer-facing managed services. The Managed Platform Services team owns and operates multiple customer-facing products, with an active and growing roadmap. This is a full software engineering role - you'll ship customer-facing features, build systems from 0 to 1, and scale existing services alongside a team of strong generalist engineers. What sets you apart is your systems programming expertise, which you'll bring as a technical authority on our Edge & Agents domain: the software that runs directly on customer infrastructure. This agent code is mission-critical - a failure here is a customer production incident. You'll provide deeper technical oversight on existing, inflight, and planned edge systems while contributing broadly across the platform. This is a full-time position. What You'll Be Working On: Customer-Facing Feature Development: Build and scale core platform services end-to-end - from greenfield 0-to-1 projects to scaling systems handling growing production traffic. Edge & Agents Technical Oversight: Serve as the team's subject matter expert on edge software. Review existing agent architectures, provide technical guidance on inflight designs, and shape how we build and operate software at the system level. Edge Agent Development: Build and maintain lightweight, high-reliability agents deployed on customer VMs. Minimize CPU/memory footprint without sacrificing observability coverage. Linux Kernel Metrics & eBPF: Instrument low-level system metrics using eBPF and procfs to power Crusoe's monitoring and telemetry pipeline. Packaging & Distribution: Own agent packaging and deployment via Helm charts, ensuring smooth delivery across customer environments. Pull-Based Scraping Architecture: Design and evolve the "pull" scraping logic that collects metrics from customer infrastructure with minimal operational overhead. Cross-Functional Collaboration: Partner with Control Plane, Storage, and SRE teams to ensure agent data feeds are reliable, well-structured, and operationally sound. Cross-Team Bridge: Serve as the technical bridge between the Managed Platform Services team and adjacent infrastructure teams - including SRE and Compute - who work on the cloud hypervisor and lower-level platform layers. Build enough expertise in their domains to collaborate effectively, represent your team's requirements clearly, and influence decisions that affect the edge agent stack. Technical Leadership: Set patterns and frameworks adopted across the team. Mentor senior engineers, contribute to architecture decisions, and help scope quarterly roadmap items with engineering and product leadership. What You'll Bring to the Team: Systems Programming Expertise: Strong proficiency in Python, Go, and/or Shell scripting, with comfort working across languages as the problem demands. Linux Kernel Metrics: Experience instrumenting low-level system metrics. Comfort working at the procfs level. Kubernetes & Helm: Strong understanding of Kubernetes internals and experience packaging and deploying workloads via Helm charts. Operational Mindset: On-call experience on a customer-facing team is required. You proactively identify gaps in monitoring, alerting, and tooling - and close them. Systems Programming Expertise: Strong proficiency in Python, Go, and/or Shell scripting, with comfort working across languages as the problem demands. Reliability-First Engineering: You design for crash safety, low resource footprint, and graceful degradation. You think about what happens when things go wrong before writing a line of code. Scalable Design Thinking: You plan how systems evolve under traffic growth. You consider resiliency, HA, and disaster recovery from the start.

Additional Information

Crusoe is on a mission to accelerate the abundance of energy and intelligence . As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Crusoe? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect