Design and develop internal tools, CLIs, and APIs (primarily in Go and Python) that enable infrastructure self-service, automate complex workflows, and improve operational efficiency
Build integrations between infrastructure systems -- connecting CMDB/IPAM (NetBox), secrets management (HashiCorp Vault), hypervisor APIs (Proxmox), monitoring platforms, and CI/CD pipelines into cohesive automated workflows
Develop and maintain API clients and libraries for interacting with infrastructure services (Proxmox API, Vault API, NetBox API, iLO Redfish, container registries)
Write well-tested, documented, and maintainable code with proper versioning, release processes, and code review practices
Infrastructure as Code & Ansible Development
Architect, develop, and refactor Ansible roles and playbooks across a large-scale inventory spanning 30+ datacenters, 80+ group variable files, and 40+ roles
Design reusable, composable Ansible role patterns that scale cleanly as the DC footprint grows -- new DCs should be deployable with minimal variable additions
Improve idempotency, error handling, and test coverage across the existing Ansible codebase
Develop custom Ansible modules, plugins, and lookup plugins where upstream modules may be insufficient (e.g., custom Vault integration, Proxmox API interactions, iLO automation)
Automate bare-metal server lifecycle end-to-end: from iLO bootstrap through OS installation, hypervisor configuration, VM provisioning, and service deployment
CI/CD Pipeline Engineering
Design, write, and maintain GitLab CI pipelines for infrastructure automation, including multi-stage deployment workflows with linting, validation, canary testing, and regional rollout
Build pipeline patterns for safe infrastructure changes: staged rollouts, automated rollback, drift detection, and change validation
Create reusable pipeline templates and shared CI components that standardise how infrastructure changes are tested and deployed
Implement automated testing for Ansible roles and infrastructure changes (molecule, ansible-lint, integration testing in ephemeral environments)
Kubernetes & Container Platform Automation
Develop automation for self-hosted Kubernetes cluster lifecycle management: provisioning, upgrades, scaling, and disaster recovery
Build and maintain container image build pipelines, registry management, and image promotion workflows
Create Kubernetes operators or controllers (in Go) where custom automation of cluster-level concerns is needed
Automate workload deployment patterns, including Helm chart development and GitOps workflows
Cloud Infrastructure Automation
Develop IaC and automation for AWS and Azure resources, integrating cloud infrastructure with on-premises systems
Build automation that spans hybrid environments -- coordinating deployments across bare-metal, virtualized, and cloud targets from a unified pipeline
Observability & Reliability Engineering
Instrument internal tools and automation with proper logging, metrics, and tracing
Build automated remediation workflows that respond to monitoring alerts and reduce mean time to recovery
Develop reporting and dashboards that provide visibility into infras
Benefits
Vision insurance
Additional Information
At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation.
Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.
F5 is bringing a better digital world to life by helping organizations create, secure, and run applications that power our lives. Within the Platform Engineering team, this role helps ensure our platform is operated safely, reliably, and with operational excellence.
We are looking for a Senior Site Reliability Engineer that leads with kindness, and possesses a strong software development background to join our Infrastructure Engineering team. Your primary focus will be building automation, tooling, and internal platforms that enable our team to operate a global, multi-datacenter infrastructure spanning a growing number of Points of Presence across the globe.
Deep familiarity with production infrastructure -- bare-metal hypervisors, containerized workloads, Kubernetes clusters, and cloud platforms -- is essential, but your primary lens should be on automation and code. You will develop internal tools and APIs in Python and Go, design and maintain Ansible automation across hundreds of hosts, build CI/CD pipelines, and create self-service interfaces that reduce toil and eliminate manual operations.
You will work within a PCI-DSS compliant environment and participate in a 24x7 on-call rotation.