Sr. DevOps Engineer II (Remote Eligible in Bulgaria)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Benefits
Additional Information
For over 20 years, Smartsheet has empowered teams to manage work seamlessly and scale solutions smarter. Now, in our most ambitious chapter yet, we are uniting human teams with AI agents. By orchestrating the work agents do best, automating manual tasks and uncovering insights at scale, we create the space for people to focus on what truly matters: judgment, creativity, and big thinking. That is magic at work, and it's what we show up for every day. We are seeking a Senior DevOps Engineer to join the team that owns and operates Smartsheet's edge proxy platform and internal engineering tooling. The edge proxy is a custom-built, high-performance reverse proxy that serves as the entry point for all traffic across Smartsheet's commercial and FedRAMP-authorised US Government environments. The team owns the full lifecycle of this platform: from maintaining the proxy binary and its configuration tooling, to provisioning and operating the multi-region AWS EKS infrastructure it runs on, to managing the observability and on-call story for every service that routes through it. Beyond the edge layer, the team also drives the evaluation and adoption of internal developer tooling - from developer portals to productivity platforms - that improves the engineering experience across the company. You will work remotely from Bulgaria and will be reporting to the Engineering Manager. You Will: Own and evolve the edge proxy platform by maintaining, upgrading, and extending a high-performance reverse proxy - including maintaining the proxy binary and its configuration tooling, writing Go and Python automation, managing the full container image lifecycle on hardened Linux base images, and working across the broader edge layer, including CDN, WAF, and traffic management capabilities. Build and maintain cloud infrastructure as code by designing and implementing Terraform/Terragrunt modules and live environment configurations managing EKS clusters, load balancers, IAM roles, VPC networking, ECR registries, and supporting AWS services across multiple regions including GovCloud. Operate Kubernetes clusters at scale by managing multi-region, multi-cluster EKS deployments via FluxCD GitOps workflows and Helm charts, including node AMI rotation, add-on lifecycle management, and horizontal pod autoscaling. Build and own CI/CD pipelines by designing, maintaining, and improving shared GitLab CI/CD pipeline templates used across all team repositories, and by building and operating alternative pipeline workflows for isolated government cloud environments. Automate operational toil by building and maintaining tooling for tasks such as container image patching, EKS AMI rotation, air-gapped ECR image sync to GovCloud, and automated merge request creation for monthly version-bump patching cycles. Manage observability and on-call responsibilities by provisioning and maintaining Datadog SLOs, monitors, and dashboards via Terraform, and by participating in the team's on-call rotation responding to edge proxy incidents across production and GovCloud environments. Support FedRAMP/GovCloud operations by operating the GovCloud environment with its unique constraints - air-gapped image distribution, infrastructure automation in isolated networks, and alert management with compliance-aware data handling. Evaluate and adopt internal developer tooling by researching, prototyping, and driving the adoption of internal tools that improve engineering productivity across the company - including developer portals, platform self-service capabilities, and other tooling that raises the bar for the developer experience at Smartsheet. Mentor and collaborate with peers across the team through code reviews, architecture discussions, and runbook authorship, fostering a culture of engineering excellence and operational rigour. Strategically apply AI tools within the team's domain to improve project execution, infrastructure design, quality, and debugging, and lead adoption of AI best practices across the team. Apply sustained focus and independent judgement when troubleshooting complex, multi-system incidents, managing cognitive load across concurrent workstreams in a high-availability environment. Maintain resilience and composure when navigating ambiguous or high-pressure operational situations, including production incidents and compliance-sensitive environments, demonstrating the psychosocial steadiness required of an on-call engineer. Perform standard sedentary work involving extended screen use in a remote environment, including standard computer and keyboard use. Work within Smartsheet's fully remote, geographically distributed team structure, contributing to a psychologically safe and inclusive team environment where diverse perspectives and approaches are welcomed. This role operates within standard business hours with flexibility expected around on-call responsibilities. No regular travel is required. You Have: 8+ years of experience in DevOps,
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Smartsheet? Share your experience