Staff Software Engineer - (Linux & GPU Environment) - Substrate Team
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Requirements
- Linux infrastructure: Built or improved Linux infrastructure at production scale and debugged issues from the kernel through to the application layer.
- Large-scale infrastructure: Operated infrastructure at scale and handled problems where the path forward was not obvious.
- GPU or specialised compute: Built GPU clusters, AI training infrastructure, or other large compute systems where capacity, cost, and performance all mattered.
- Hardware and bare metal: Worked with on-prem infrastructure, hypervisor configuration, or the physical layer of a compute platform.
- Ambiguous technical problems: Taken ownership of unclear problems, defined the options, and tested the right path forward.
- AI tools in engineering work: Used AI tools in real work and understand what changes when AI workloads become part of infrastructure.
- Technical knowledge
- Linux kernel: Production knowledge of how Linux works.
- C and systems languages: Production experience, especially for kernel-adjacent work.
- GPU infrastructure: Cluster design, scheduling, and integration for AI workloads.
- Hypervisor and virtualisation: experience configuring KVM and knowledge of how virtualisation works under
Benefits
Additional Information
Join the team redefining how the world experiences design. Hey, g'day, mabuhay, kia ora, 你好, hallo, vítejte! Thanks for stopping by. We know job hunting can be a little time consuming and you're probably keen to find out what's on offer, so we'll get straight to the point. Where and how you can work Our flagship Sydney campus is uniquely Canva - an extension of our Surry Hills neighbourhood. It's a thoughtfully designed space with plenty of room to collaborate, focus, and connect. This role is based in Sydney, and we're looking for someone who calls it home. Our hybrid way of working gives you the flexibility to work remotely, and to come together on campus for meaningful in-person collaboration and connection when it matters most. We trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals. What you'd be doing in this role As Canva scales change continues to be part of our DNA. But we like to think that's all part of the fun. So this will give you the flavour of the type of things you'll be working on when you start, but this will likely evolve. The Runtime Platform Sub Group within Core Infrastructure Group keeps Canva's Linux fleet, GPU clusters, and compute layer running. Every backend service, training job, inference request, and machine that comes online depends on this work. It covers OS config for security, compliance and performance, hypervisor setup, provisioning, orchestration, and the tuning that keeps the platform fast and stable. Canva has already moved much of its workload onto Kubernetes. The next phase is about making the compute layer stronger and more efficient. That includes GPU capacity for AI training and inference in Linux systems underneath those clusters that make a diverse fleet of baremetal neoclouds consistent and operable at scale, the hardware labs in Colo's, and the path toward running Canva-owned GPUs at data centre scale. It also includes performance work at the machine level, where one change in the right place can save time and cost across the company. This role sits in a small field of engineers who do this work well at scale. Canva is still building the tools, observability, and debugging systems that mature infrastructure teams take years to develop. The person in this seat will shape how Linux at Canva evolves over the next few years. At the moment, this role is focused on: Linux orchestration at scale: Extending our CAPI interface for more providers, tuning how Linux orchestration works across cloud providers and on-prem, and how the platform scales as Canva's compute footprint grows. GPU clusters: Support GPU clusters that run AI training and inference workloads at scale, including kernel tuning, scheduler decisions, and I/O paths that keep the hardware moving. Bare metal infrastructure: Build hardware configuration and hypervisor patterns that turn the lab into production infrastructure and support Canva-owned GPUs at data centre scale. Linux internals: Work on kernel-level changes, EBPF integration logic, and performance tuning at the box level. Technical direction: Set the patterns the wider infrastructure group follows, write design docs that guide decisions, and help shape how Linux is used at Canva. Hands-on engineering: Stay close to the code, ship to production, and make changes that improve how backend services and AI workloads run. What success looks like Canva's Linux platform supports GPU workloads across multiple cloud providers and the new bare metal infrastructure as a single system. The setup is consistent, the orchestration works cleanly, and the teams using it do not need to manage the underlying details. You're probably a match We'd like to hear from you if you meet some of these requirements. You do not need to meet all of them.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Canva? Share your experience