Sr Manager, Platform Engineering
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Benefits
Additional Information
Job Description: Flexential is hiring a Platform Engineering leader in the IT organization to plan roadmaps, establish requirements, develop and operationally manage platform technologies including Observability, DevOps, ITSM and Integrations. Current platform initiatives include building a next-gen OpenTelemetry observability platform for 40+ data center facilities and platforms using a LGTM stack (Loki, Grafana, Tempo, Mimir); and enabling secure high-velocity SDLC capability enabling paved pathways, engg excellence measurements and devsecops across multiple development teams. This role sits at the intersection of engineering management and hands-on technical work. You will lead a team of platform engineers, create/capture requirements, establish and own technical planning and implementation, and be accountable for platform reliability, security, and delivery timelines. This is a high-visibility, high-impact role - the platforms you build will be foundational to Flexential's IT services, as well as enable AIOps and AI infrastructure. Key Responsibilities and Essential Job Functions: Lead the design, development, deployment and operational management of automated, resilient, high availability, self-healing, secure platforms with native-AI capabilities for IT needs, serving both internal as well as customer business capabilities. Lead, Build and manage the Platform Engineering team and function - hiring, mentoring, performance management, and technical roadmap ownership. Plan, build and operate an OpenTelemetry Observability platform with technologies including Grafana, Mimir, Loki, Tempo, Alertmanager on Kubernetes/RKE2 using Helm and ArgoCD. Build a n automated federated Observability Edge Stack - Prometheus + OTel collector nodes deployed per site and Zabbix auto-discovery configuration and Prometheus scrape profile library for 10+ device classes (Cisco, Juniper, Dell, NetApp, etc.). Design, develop and manage engineering lifecycle platforms for high-velocity secure SDLC using Gitlab and similar / related technologies. Build and operate iaC and CI/CD platforms including GitLab CI/CD, Terraform, Ansible AWX, Helm, and ArgoCD for automated provisioning and application deployment. Own, enhance and operate critical IT platform technologies e.g Boomi for integrations , AWS for Cloud environments, including their hosted infrastructure. Establish and enforce platform security posture: secrets management via CyberArk/ Conjur , RBAC, mTLS , compliance boundary design, and zero inbound telemetry architecture. Build and integrate ITSM capabilities for various platforms e.g automated incident creation, CI enrichment, and CMDB correlatio n Define and implement extensibility patterns including AIOps : e.g anomaly detection hooks, event correlation pipeline design, and integration with future ML/AI tooling. Partner with other IT and business teams for App Dev, requirements capture, delivery validation and integration needs . Represent platform engineering in cross-functional architecture reviews and executive-level program updates. Perform other management and technical duties as required and assigned for team and operational resilience e.g team building, on -call rotation, etc T ravel maybe required to team or project events Required Qualifications 1 2 + years of relevant technical experience with 4+ years in a management (or Principal-level) role leading a engineering team DevOps / Platform Engineering - 8+ years, End-to-end ownership of developer/infrastructure platforms; Kubernetes, Helm, ArgoCD , service-mesh, containerized workloads GitOps / CI - CD - 5 + years GitLab CI/CD, pipeline authoring, infrastructure-as-code delivery 8+ years of expert level automation frameworks experience with Python, Terraform, Ansible, etc. Infrastructure (Linux/VM) - 8+ years Linux systems administration, VM lifecycle (VMware vCenter/VCF), Netapp storage and compute provisioning Working knowledge of Networking - 3 + years, TCP/IP, BGP/OSPF, SNMP protocol AI tooling - Strong understanding (or 1+ years experience ) with MCP, Agentic workflows, SRE workflows e.g AIOps for Anomaly detection, event correlation, alert noise reduction on Prometheus and Grafana stack Experience with Secrets & Security - 4 + years, CyberArk, Conjur , Vault, or equivalent; RBAC design, compliance boundary architecture Engineering Management - 4 + years, Hiring, team building, performance management, roadmap ownership for teams of 5 + engineers Other training and experience may be substituted for the job requirements at the discretion of the manager Preferred Skills Hands- on experience or working knowledge of Boomi integrations PaaS( iPaaS ) technologies Experience with design and development of DR test application/automation and process workflows for corporate BCP execution . Hands-on experience working with AWS products in a Well-architected Framework and multi-account model to develop various compute, storage, network iaaS and
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at flexential? Share your experience