Lead the management and optimization of production environments at scale, ensuring high availability and performance across cloud-native infrastructures.
Design and implement AI-augmented observability frameworks, leveraging machine learning to identify patterns, predict failures, and automate anomaly detection.
Drive the operability roadmap by implementing advanced automation, CI/CD enhancements, and business continuity planning for complex, multi-region services.
Resolve complex system and network problems by applying sophisticated analytical thought and innovative solutions.
Participate in a senior-level on-call rotation, leading incident resolution and conducting deep-dive post-mortems to prevent recurrence.
Requirements
BS/MS in Computer Science or equivalent degree.
5+ years of experience in DevOps, Production Engineering, or SRE roles within high-scale environments.
Experience with containerization and orchestration (Docker, Kubernetes) and cloud platforms (AWS or GCP).
Familiar with Python, Node.js or Go, with a focus on building scalable automation.
Intermediate level of coding expertise in one or more language including Nodejs, Python, or Go
Experience working with IaC (eg. Terraform, Ansible)
Experience with using Git to manage code
Experience with building CI/CD pipelines
Expert knowledge of TCP/IP, networking, and security best practices..
Deep understanding of UNIX/Linux system internals and kernel-level troubleshooting.
Experience working with GitHub Actions
Experience with system reliability tools including OpenTelemetry, Prometheus, Splunk, and Grafana.
A proven track record of staying current on emerging AI tools and proactively suggesting workflow improvements.
If you're curious about how this factors into this role, please discuss with the recruiter.
Currently work for Yahoo? Please apply on our internal career site.
Benefits
Flexible schedule
Additional Information
Yahoo serves as a trusted guide for hundreds of millions of people globally, helping them achieve their goals online through our portfolio of iconic products. For advertisers, Yahoo Advertising offers omnichannel solutions and powerful data to engage with our brands and deliver results.
A Little About Us
Commerce is a fast-growing business for Yahoo. We are expanding our US Commerce Production Engineers (PE) team to help implement production stability improvements. We build and scale highly performant, reliable, and secure infrastructure. As we enter the next phase of growth, we are looking for senior technical leaders who can marry traditional engineering excellence with modern, AI-augmented workflows to drive unparalleled system reliability.
A Lot About You
The Yahoo Production Engineering team is seeking an experienced Sr. Production Engineer with expertise in AWS (or GCP). We are looking for a technical leader with a commitment to solving complex infrastructure challenges and a focus on building the next generation of resilient systems.
You are a problem solver who enjoys digging into code and owning the reliability domain. As a member of the PE team, you will lead operability improvements, security, infrastructure, and automation initiatives. You are comfortable operating in an evolving, AI-augmented environment.