Skip to main content
Back to jobs

Staff Software Engineer (Platform Architecture & Execution Model)

External
redcellpartners logoRedcellpartners · Seattle, WA
Full-timeRemote5d ago
API DesignHugging FaceIncident ResponseObservabilitySystem Design
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

Red Cell Partners is an incubation firm building and investing in rapidly scalable technology-led companies that are bringing revolutionary advancements to market in three distinct practice areas: healthcare, cyber, and national security. United by a shared sense of duty and deep belief in the power of innovation, Red Cell is developing powerful tools and solutions to address our Nation's most pressing problems. About Trase Co-founded in 2023 by Joe Laws and Grant Verstandig , Trase Systems is AI, Uncomplicated. Trase empowers enterprise leaders to harness the full potential of AI without the associated complexity and risks. We are an end-to-end solution for deploying, managing, and optimizing AI in the enterprise. Our platform specializes in bridging the "last mile" of AI adoption, unlocking AI's full potential while driving efficiency and significant cost savings. Trase is at the forefront of AI Agent innovation, topping the Hugging Face GAIA Leaderboard for Generalized AI Assistants, ahead of industry giants such as Google, Meta, Microsoft, and OpenAI. We are leveraging our cutting-edge technologies to develop mission-critical agentic applications in complex industries such as Healthcare, Oil & Gas, and National Security. As Staff Software Engineer , you'll own the core execution model and platform architecture of Trase OS - the shared platform ("agentic operating system") that powers all Trase deployments in regulated environments. You'll define the abstractions and APIs that connect workflows, agents, tools, and product surfaces, and ensure the correctness, scalability, and extensibility of the system. This is a company-critical role: you are responsible for how the system behaves under real-world conditions, including failure, scale, and security constraints. Your work sets the technical direction for the platform and acts as a force multiplier across all engineering teams. Clean abstractions and correctness-under-failure are critical because we operate long-lived agents in healthcare/defense environments where auditability and reliability are non-negotiable. Why This Role Is Needed Trase OS is an orchestration-heavy system coordinating long-lived workflows, agents, and tools across multiple services and environments. As the platform evolves, the primary risks shift from implementation to system design quality: Poor abstractions create tight coupling across services Workflow execution becomes difficult to reason about under failure Platform capabilities fragment instead of becoming reusable primitives Scaling introduces complexity instead of leverage This role exists to: Define clean, durable abstractions for the platform execution model Ensure correctness and determinism in workflow execution Translate evolving product requirements into coherent platform architecture Enable teams to build on Trase OS without introducing systemic complexity What Makes This Role Hard You are designing systems where failure is the norm, not the exception, and correctness must be preserved across retries, restarts, and partial execution You must balance clean abstractions with real-world constraints (performance, security, multi-tenant environments) Decisions made here become foundational primitives used across all products and teams The system must remain understandable and auditable, even as complexity and scale increase

Responsibilities

  • Develop the core execution model (state machine, lifecycle, resource model, failure semantics)
  • Design platform APIs/SDKs connecting workflows, agents, tools, and product surfaces; drive versioning & compatibility
  • Guarantee correctness via idempotency, deterministic replays, compensating actions, and data integrity
  • Engineer reliability at scale: concurrency controls, rate limits, backpressure, sharding/partitioning, and workload isolation
  • Build security & governance into the core: RBAC/ABAC, policy enforcement, fine-grained audit & lineage
  • Deliver observability: distributed tracing, structured logs, metrics, and evaluation hooks; build an "explainable trail" of agent actions
  • Own quality: design reviews, test strategy (unit, property, chaos), performance baselines, SLOs, incident response, and postmortems
  • Mentor & unblock senior engineers; partner with Product, Security, and Customer teams to translate requirements into durable primitives
  • Make pragmatic choices on storage, queueing, and compute; create paved roads that accelerate all other teams
  • Define system boundaries and reduce cross-service coupling through clear architectural patterns
  • Drive platform-wide standards for correctness, reliability, and API design across teams
  • Balance short-term delivery with long-term architectural integrity, ensuring the platform evolves without accumulating systemic risk

Requirements

  • 10+ years of experience building distributed/platform systems, including significant experience defining architecture across teams or domains
  • Experience build

Benefits

Health insurance

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at redcellpartners? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect