Senior DevOps Engineer (Remote Eligible in Bulgaria)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Benefits
Additional Information
For over 20 years, Smartsheet has helped people and teams achieve-well, anything. From seamless work management to smart, scalable solutions, we've always worked with flow. We're building tools that empower teams to automate the manual, uncover insights, and scale smarter. But more than that, we're creating space- space to think big, take action, and unlock the kind of work that truly matters. Because when challenge meets purpose, and passion turns into progress, that's magic at work, and it's what we show up for everyday. We are seeking a Senior DevOps Engineer to join the team that owns and operates Smartsheet's edge proxy platform and internal engineering tooling. The edge proxy is a custom-built, high-performance reverse proxy that serves as the entry point for all traffic across Smartsheet's commercial and FedRAMP-authorised US Government environments. The team owns the full lifecycle of this platform: from maintaining the proxy binary and its configuration tooling, to provisioning and operating the multi-region AWS EKS infrastructure it runs on, to managing the observability and on-call story for every service that routes through it. Beyond the edge layer, the team also drives the evaluation and adoption of internal developer tooling - from developer portals to productivity platforms - that improves the engineering experience across the company. You will work remotely from Bulgaria. You Will Own and evolve the edge proxy platform: Maintain, upgrade, and extend a high-performance reverse proxy - including maintaining the proxy binary and its configuration tooling, writing Go and Python automation, managing the full container image lifecycle on hardened Linux base images, and working across the broader edge layer, including CDN, WAF, and traffic management capabilities. Build and maintain cloud infrastructure as code: Design and implement Terraform/Terragrunt modules and live environment configurations managing EKS clusters, load balancers, IAM roles, VPC networking, ECR registries, and supporting AWS services across multiple regions including GovCloud. Operate Kubernetes clusters at scale: Manage multi-region, multi-cluster EKS deployments via FluxCD GitOps workflows and Helm charts, including node AMI rotation, add-on lifecycle management, and horizontal pod autoscaling. Build and own CI/CD pipelines: Design, maintain, and improve shared GitLab CI/CD pipeline templates used across all team repositories; build and operate alternative pipeline workflows for isolated government cloud environments. Automate operational toil: Build and maintain tooling for tasks such as container image patching, EKS AMI rotation, air-gapped ECR image sync to GovCloud, and automated MR creation for monthly version-bump patching cycles. Manage observability and on-call: Provision and maintain Datadog SLOs, monitors, and dashboards via Terraform; participate in the team's on-call rotation responding to edge proxy incidents across production and GovCloud environments. Support FedRAMP/GovCloud operations: Operate the GovCloud environment with its unique constraints - air-gapped image distribution, infrastructure automation in isolated networks, and alert management with compliance-aware data handling. Evaluate and adopt internal developer tooling: Research, prototype, and drive the adoption of internal tools that improve engineering productivity across the company - including developer portals, platform self-service capabilities, and other tooling that raises the bar for the developer experience at Smartsheet. Mentor and collaborate: Share knowledge across the team through code reviews, architecture discussions, and runbook authorship; foster a culture of engineering excellence and operational rigour. Strategically apply AI tools: Strategically apply and champion AI tools within your team's domain to improve project execution, infrastructure design, quality, and debugging, leading adoption of AI best practices. You Have 5+ years of experience in DevOps, platform engineering, or site reliability engineering. A BS or MS in Computer Science, Engineering, or a related field, or equivalent industry experience. Deep proficiency with Terraform and Terragrunt for managing production cloud infrastructure at scale across multiple environments and regions. Strong Kubernetes expertise, including EKS cluster operations and Helm chart authoring. Hands-on experience with AWS networking and container workload services: EKS, ALB/NLB, VPC, IAM, ECR, Route53, CloudWatch, and EventBridge. Proficiency in at least one general-purpose programming language - Go or Python preferred - for building operational tooling and automation. Solid understanding of reverse proxies, API gateways, or load balancers (NGINX, HAProxy, or equivalent). Experience designing and maintaining CI/CD pipelines (GitLab CI preferred), including shared template libraries across multiple repositories. Experience with container image security practices: hard
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Smartsheet? Share your experience