Senior AI-Native DevOps / Operations Engineer (AMER)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Design, build, and improve the production platform powering Valency
- Tighten CI/CD processes so changes are tested, gated, observable, and safe to ship
- Improve production reliability, latency, deployment safety, and incident response
- Build the operational feedback loops that help engineering and product teams act on real production behavior
- Establish the right logging, analytics, tracing, alerting, and workflow instrumentation as the platform scales
- Define and implement guardrails for agent-involved software delivery and operations
- Introduce human-in-the-loop approval flows where autonomy needs stronger controls
- Improve cost efficiency across cloud infrastructure and platform operations
- Help shape security, compliance, and auditability foundations for SOC 2, ISO 27001, and FedRAMP-oriented environments
- Contribute to the long-term platform engineering direction as the team grows and specializes
- As the senior engineer on-site, you will:
- Own production operations and operational excellence for this function
- Lead incident response expectations for the role
- Establish the operating model the broader team will scale on
- Work onsite in the SF Bay Area
- What Success Looks Like
- In the first 6-12 months, you will help Valency begin tracking and materially improve:
- Deployment frequency and release confidence
- Change failure rate and rollback quality
- MTTR and incident handling
- p95 / p99 latency and system responsiveness
- Uptime and service reliability
- Alert quality and signal-to-noise ratio
- Infrastructure cost efficiency
- Operational visibility into agent workflows and production behavior
- Guardrail coverage for agent-authored or agent-assisted changes
- What You'll Work With
- Today the platform makes use of AWS and adjacent infrastructure including:
- ECS / Fargate
- EKS / container orchestration environments
- RDS
- S3
- Cloudflare
- CloudWatch
- Queues, caches, schedulers, and batch / background processing systems
- We currently use GitHub Actions and expect this person to help evolve that into a stronger long-term platform engineering and delivery foundation
- Our observability and analytics stack is still open for innovation. We want someone who is comfortable evaluating the tradeoffs and building the right system as complexity grows.
- What Makes This Role AI-Native
- This is not "DevOps, but with AI in the title."
- You will help build the operational system around software and workflows that increasingly involve agents. That includes:
- Tracing workflows across agent-driven and human-driven systems
- Developing production guardrails to keep systems from going off the rails
- Designing approval paths for high-risk or high-impact actions
- Turning production signals into actionable inputs for product and engineering
- Helping close the loop between what the system is doing, how users experience it, and how the platform should evolve
- We do not require prior experience operating AI-native systems at scale. We do require strong judgment, strong production systems experience, and a willingness to build the right AI-era operating model.
- Own and improve CI/CD pipelines, release controls, and deployment workflows
- Build and maintain highly reliable AWS-based production systems
- Improve observability across logs, metrics, traces, events, and workflow state
- Instrument platform behavior so system issues, regressions, and slowdowns are quickly visible and actionable
- Create operational analytics that h
Benefits
Additional Information
About Valency Valency Systems is a small, dynamic team of engineers, scientists, and researchers building the global hub for the agentic research era. We're based in Berkeley, California, and we're building something that matters. If you care about open science, advancing research at the speed of thought, and using AI to accelerate discovery, we'd love to talk. Our team is hybrid. We come together in person 3 days a week, with the option for 2 days of remote work. The Position We're hiring an AI-native DevOps / Operations Engineer to help build and operate the platform behind Valency. This is not a narrow infrastructure maintenance role. We want builders who can design and harden production systems, improve CI/CD and release quality, raise reliability and response times, and create the observability, analytics, and guardrails needed to safely operate a rapidly evolving platform. This role sits at the intersection of platform engineering, cloud infrastructure, production operations, and AI-era software delivery. You will help close the loop from agentically written software to reliable, performant systems in production. That means better tests, better release controls, stronger guardrails, richer production telemetry, clearer workflows for human approval, and tighter feedback into product and engineering. This is an especially strong fit for someone who has helped scale high-growth SaaS systems, likes building from first principles, and wants to experience that kind of growth again in a new context.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Valency Systems Inc.? Share your experience