Skip to main content
Back to jobs

Head of AI Inference & MLOps

External
Deeter Analytics logoDeeter Analytics · Austin Area
Full-timeOn-site3mo ago
ObservabilityRouting
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

We need a senior operator-builder who can sit at the intersection of: AI infrastructure inference performance engineering model serving and routing marketplace monetization customer / partner integration revenue optimization You will design and run the inference platform that determines how our GB300 NVL72 racks are monetized in the real-time market. That may include direct enterprise workloads, marketplace distribution, API-based reselling, model hosting, fine-tuned/private deployments, and emerging inference channels. You should know what makes money on modern inference hardware, what does not, and why. You should be able to answer questions like: Which open-weight and commercial-compatible models should run on this hardware first? How should workloads be split between premium low-latency serving, bulk throughput, reserved capacity, and experimental capacity? Should we route through third-party marketplaces, sell directly, or do both? What software stack gives us the best performance per watt, per GPU, and per dollar of capex? How do we maximize realized revenue rather than theoretical benchmark performance? How do we scale from a 7MW launch to a repeatable 50MW AI factory operating model?

Responsibilities

  • Build and lead the inference monetization strategy for our first 7MW deployment and expansion to 50MW
  • Define the technical and commercial operating model for turning GB300 NVL72 racks into revenue-producing assets
  • Evaluate and implement the model serving stack, scheduling layer, inference engine, observability stack, and API platform
  • Select and optimize the mix of workloads across:
  • real-time inference
  • reasoning workloads
  • premium low-latency API traffic
  • batch / overflow workloads
  • dedicated enterprise deployments
  • private/fine-tuned model hosting
  • Identify the best go-to-market channels for capacity monetization, including direct sales and marketplace/API distribution partners
  • Own benchmarking methodology based on actual profit and production metrics, not vanity metrics
  • Drive workload placement decisions based on revenue per rack, revenue per GPU-hour, revenue per MW, latency targets, and customer value
  • Partner with datacenter engineering, networking, and facilities teams to ensure the physical plant supports the intended software monetization strategy
  • Build pricing, SLAs, utilization strategy, and customer segmentation framework
  • Create dashboards and control systems for:
  • utilization
  • queue health
  • latency
  • token throughput
  • margin by workload
  • failure rate
  • realized revenue by cluster / rack / model / customer
  • Lead decisions around multi-tenant vs single-tenant deployments, reserved vs on-demand capacity, and when to prioritize direct contracts over marketplace traffic
  • Build and manage the team required to scale this function over time
  • What Success Looks Like
  • In the first 3-6 months, you will:
  • Stand up a production inference platform for our initial GB300 NVL72 deployment
  • Recommend the highest-value initial workloads and monetization channels
  • Launch a repeatable commercialization strategy for rack capacity
  • Establish a clear performance and revenue measurement framework
  • Identify where we should sell capacity: direct, through marketplaces, via strategic partners, or through a hybrid approach
  • Turn the first cluster into a measurable cash-generating operation
  • In the first 12 months, you will:
  • Build the operating playbook for scaling from 7MW

Benefits

Health insurance

Additional Information

Location: Austin, Texas area / On-site preferred Project: 7MW Phase I AI Datacenter -> 50MW Campus Expansion Reports to: Founders / Executive Team About the Project We are building a high-density AI datacenter campus outside Austin, Texas, beginning with approximately 7MW of NVIDIA GB300 NVL72 infrastructure and scaling to 50MW+ . The initial deployment is designed around real-time inference, reasoning, and high-value AI serving workloads , with a focus on monetizing capacity in live markets rather than simply leasing powered space. This is not a traditional datacenter operations role. We are hiring the person who will make the racks make money. This leader will own the strategy and execution required to turn rack-scale GPU infrastructure into a profitable inference business: selecting the right models, runtimes, orchestration stack, routing layer, pricing strategy, customer segments, and marketplace relationships to maximize revenue, uptime, and utilization. The right candidate understands that raw compute is not the business. Monetized tokens, latency-adjusted utilization, and gross margin are the business.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Deeter Analytics? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect
Head of AI Inference & MLOps at Deeter Analytics