Skip to main content
Back to jobs

Sr Manager, Platform Engineering

External
flexential logoFlexential · Denver Corp, CO
Full-timeOn-site4d ago
AnsibleArgoCDAWSBGPCI/CDCompliance
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Benefits

Vision insurance

Additional Information

Job Description: Flexential is hiring a Platform Engineering leader in the IT organization to plan roadmaps, establish requirements, develop and operationally manage platform technologies including Observability, DevOps, ITSM and Integrations. Current platform initiatives include building a next-gen OpenTelemetry observability platform for 40+ data center facilities and platforms using a LGTM stack (Loki, Grafana, Tempo, Mimir); and enabling secure high-velocity SDLC capability enabling paved pathways, engg excellence measurements and devsecops across multiple development teams. This role sits at the intersection of engineering management and hands-on technical work. You will lead a team of platform engineers, create/capture requirements, establish and own technical planning and implementation, and be accountable for platform reliability, security, and delivery timelines. This is a high-visibility, high-impact role - the platforms you build will be foundational to Flexential's IT services, as well as enable AIOps and AI infrastructure. Key Responsibilities and Essential Job Functions: Lead the design, development, deployment and operational management of automated, resilient, high availability, self-healing, secure platforms with native-AI capabilities for IT needs, serving both internal as well as customer business capabilities. Lead, Build and manage the Platform Engineering team and function - hiring, mentoring, performance management, and technical roadmap ownership. Plan, build and operate an OpenTelemetry Observability platform with technologies including Grafana, Mimir, Loki, Tempo, Alertmanager on Kubernetes/RKE2 using Helm and ArgoCD. Build a n automated federated Observability Edge Stack - Prometheus + OTel collector nodes deployed per site and Zabbix auto-discovery configuration and Prometheus scrape profile library for 10+ device classes (Cisco, Juniper, Dell, NetApp, etc.). Design, develop and manage engineering lifecycle platforms for high-velocity secure SDLC using Gitlab and similar / related technologies. Build and operate iaC and CI/CD platforms including GitLab CI/CD, Terraform, Ansible AWX, Helm, and ArgoCD for automated provisioning and application deployment. Own, enhance and operate critical IT platform technologies e.g Boomi for integrations , AWS for Cloud environments, including their hosted infrastructure. Establish and enforce platform security posture: secrets management via CyberArk/ Conjur , RBAC, mTLS , compliance boundary design, and zero inbound telemetry architecture. Build and integrate ITSM capabilities for various platforms e.g automated incident creation, CI enrichment, and CMDB correlatio n Define and implement extensibility patterns including AIOps : e.g anomaly detection hooks, event correlation pipeline design, and integration with future ML/AI tooling. Partner with other IT and business teams for App Dev, requirements capture, delivery validation and integration needs . Represent platform engineering in cross-functional architecture reviews and executive-level program updates. Perform other management and technical duties as required and assigned for team and operational resilience e.g team building, on -call rotation, etc T ravel maybe required to team or project events Required Qualifications 1 2 + years of relevant technical experience with 4+ years in a management (or Principal-level) role leading a engineering team DevOps / Platform Engineering - 8+ years, End-to-end ownership of developer/infrastructure platforms; Kubernetes, Helm, ArgoCD , service-mesh, containerized workloads GitOps / CI - CD - 5 + years GitLab CI/CD, pipeline authoring, infrastructure-as-code delivery 8+ years of expert level automation frameworks experience with Python, Terraform, Ansible, etc. Infrastructure (Linux/VM) - 8+ years Linux systems administration, VM lifecycle (VMware vCenter/VCF), Netapp storage and compute provisioning Working knowledge of Networking - 3 + years, TCP/IP, BGP/OSPF, SNMP protocol AI tooling - Strong understanding (or 1+ years experience ) with MCP, Agentic workflows, SRE workflows e.g AIOps for Anomaly detection, event correlation, alert noise reduction on Prometheus and Grafana stack Experience with Secrets & Security - 4 + years, CyberArk, Conjur , Vault, or equivalent; RBAC design, compliance boundary architecture Engineering Management - 4 + years, Hiring, team building, performance management, roadmap ownership for teams of 5 + engineers Other training and experience may be substituted for the job requirements at the discretion of the manager Preferred Skills Hands- on experience or working knowledge of Boomi integrations PaaS( iPaaS ) technologies Experience with design and development of DR test application/automation and process workflows for corporate BCP execution . Hands-on experience working with AWS products in a Well-architected Framework and multi-account model to develop various compute, storage, network iaaS and


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at flexential? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect