Member of Technical Staff - Model Serving / API Backend Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- You will own the bridge between research breakthroughs and production systems.
- Turn research checkpoints into production-ready inference services
- Design and maintain high-performance APIs serving millions of requests
- Optimize inference latency and throughput across GPU infrastructure
- Build scalable serving architectures that handle unpredictable traffic
- Improve reliability, monitoring, and observability across model-serving systems
- Prototype and ship demos that showcase new capabilities in days, not weeks
- Collaborate closely with researchers to move from idea to live endpoint rapidly
- Tools & Context - Model Serving & API Infrastructure
- Python, FastAPI, async systems
- GPU infrastructure, CUDA, inference optimization
- Docker and Kubernetes
- Redis, Postgres, distributed task queues
- Cloud platforms (AWS, GCP, or Azure)
- Observability stacks (metrics, logging, tracing)
- This role spans backend systems, GPU performance, and production ML serving.
Requirements
- You've built and operated systems at meaningful scale. You understand the difference between a research prototype and a production system. You are comfortable navigating ambiguity, making tradeoffs, and improving systems under real-world constraints.
- You demonstrate:
- Strong judgment around performance, reliability, and cost tradeoffs
- Experience scaling APIs or ML systems under load
- Comfort working in fast-moving, research-adjacent environments
- Ownership from system design through debugging and deployment
- Role-specific experience we value:
- Building and operating ML inference services in production
- Designing scalable API architectures with async processing
- Optimizing GPU workloads (batching, quantization, compilation, CUDA)
- Managing distributed systems and task queues under variable load
- Implementing monitoring and observability for production ML systems
- Debugging performance bottlenecks across model, infrastructure, and network layers
- Bonus experience includes:
- Real-time or low-latency inference systems
- TensorRT, reduced precision, layer fusion, or model compilation techniques
- Frontend demo tooling (Streamlit, Gradio, React)
- CI/CD and automated testing for ML systems
- Security best practices for API and model serving
- How We Work Together
- Everything we do is grounded in four values:
- Obsessed. We are a frontier research lab. The science has to be right, the understanding deep, the product beautiful.
- Low Ego. The work speaks. The best idea wins, no matter who said it. Credit is shared. Nobody is above any task.
- Bold. We take the ambitious bet. We ship, we do not wait for conditions to be perfect.
- Kind. People over politics. We treat each other with genuine warmth. Agency without empathy creates chaos.
- Base Annual Salary: $180,000-$300,000 USD
- We're based in Europe and value depth over noise, collaboration over hero culture, and honest technical conversations over hype. Our models have been downloaded hundreds of millions of times, but we're still a ~50-person team learning what's possible at the edge of generative AI.
Benefits
Additional Information
About Black Forest Labs We're the team behind Latent Diffusion, Stable Diffusion, and FLUX-foundational technologies that changed how the world creates images and video. We're creating the generative models that power how people make images and video-tools used by millions of creators, developers, and businesses worldwide. Our FLUX models are among the most advanced in the world, and we're just getting started. Headquartered in Freiburg, Germany with a growing presence in San Francisco, we're scaling fast while staying true to what makes us different: research excellence, open science, and building technology that expands human creativity. Why This Role Our research team moves fast. Models improve weekly. New capabilities emerge constantly. What slows us down is not model quality-it's productionization. Without this role: Research checkpoints sit longer before becoming usable APIs Inference is slower than it needs to be APIs struggle under load Demos don't reflect the true potential of our models This role removes the bottleneck between frontier research and production reality. Once hired, researchers ship faster, demos launch faster, and customers experience models at their best.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at blackforestlabs? Share your experience