Senior Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Serve as a primary owner for the reliability, availability, performance, operability, and capacity of one or more production services
- Deploy, operate, maintain, and continuously improve production services running in Autodesk GovCloud environments
- Partner with engineering teams to ensure services are designed with reliability, scalability, security, and operability in mind
- Define and operate reliability practices such as SLOs/SLIs, error budgets, production readiness reviews, service reviews, and operational health reviews
- Build automation to improve deployment safety, operational efficiency, incident response, and service recovery
- Design, develop, and maintain software, automation, and tooling that improve the reliability, scalability, and efficiency of production systems
- Implement and improve monitoring, alerting, logging, tracing, and observability capabilities across supported services
- Lead and participate in incident response, troubleshooting, and post-incident reviews focused on learning and continuous improvement
- Develop and maintain operational documentation, runbooks, and recovery procedures
- Scale and enhance resilience testing and Gameday practices to validate system behavior, recovery capabilities, and operational readiness
- Continuously identify and eliminate operational toil through software engineering, automation, and process improvement
- Ensure supported services remain compliant with Autodesk security, privacy, and regulatory requirements, including FedRAMP and related controls where applicable
- Participate in a 24x7 on-call rotation for production services
- Function effectively in a fast-paced environment while helping establish and mature operational excellence practices for Autodesk GovCloud
Requirements
- B.S. or higher in Computer Science, Engineering, or a related technical discipline, or equivalent practical experience
- 7+ years of experience in Site Reliability Engineering, Software Engineering, Platform Engineering, Cloud Infrastructure, or Production Operations
- Experience operating and supporting customer-facing production services in large-scale cloud environments
- Strong understanding of reliability engineering principles, including SLOs/SLIs, observability, incident management, capacity planning, production readiness, and automation
- Experience with AWS, Azure, or other public cloud platforms
- Experience developing automation using languages such as Python, Go, Java, PowerShell, Bash, or similar
- Experience with Infrastructure as Code, CI/CD pipelines, deployment automation, and modern cloud operations practices
- Understanding of security, compliance, and operational risk management in production environments
- Strong written and verbal communication skills
- 10+ years of experience operating highly available, customer-facing production systems
- Experience with AWS GovCloud, FedRAMP, IL4/IL5, or other regulated cloud environments
- Experience supporting services with stringent availability, reliability, and security requirements
- Experience with containers, Kubernetes, cloud-native architectures, APIs, load balancing, networking, DNS, and distributed systems
- Experience with observability platforms such a
Benefits
Additional Information
Job Requisition ID # 26WD99276 Position Overview Want to help make a better world? As a Senior Site Reliability Engineer at Autodesk, you can help us build and operate reliable, secure, and scalable cloud services for Autodesk GovCloud products. As part of a new SRE team supporting Autodesk GovCloud , you will have a unique opportunity to help shape how Autodesk deploys, runs, and improves production services in restricted cloud environments. This is a foundational role where you will help establish the operating model, reliability practices, automation, and engineering standards needed to support critical customer-facing services. You will combine software engineering and production operations to deploy, run, monitor, improve, and automate Autodesk services in GovCloud. You will partner closely with product engineering, security, compliance, platform, and infrastructure teams to ensure services are reliable, scalable, secure, and ready for production. The ideal candidate has deep experience operating production systems at scale, an automation-first mindset, and the ability to improve reliability through engineering practices such as SLOs/SLIs, production readiness, incident management, observability, resilience testing, and toil reduction. Success in this role requires strong technical judgment, a customer-focused mindset, and a passion for using software engineering to solve operational problems at scale. In accordance with GovCloud Cloud Service Provider Security Requirements, this role must be performed by U.S. Citizens. Employment is contingent upon meeting all applicable government security and eligibility requirements, including necessary background investigations and government issued security clearances.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at autodesk? Share your experience