Senior Manager, Systems and Site Reliability Engineering
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Benefits
Additional Information
Job Description: The Senior Manager, Systems and Site Reliability Engineering serves as the operational engine for Cambria's infrastructure, acting as the primary bridge between IT strategy and technical execution. This role is responsible for the performance and development of Systems Engineers and Site Reliability Engineers. While driving the modernization to a container-first model , this leader ensures that services are available and perform optimally to scale manufacturing operations. As the #2 leader in IT Operations, you will translate strategic roadmaps into daily deliverables, manage high-pressure incident response, and serve as the technical face of the team to internal business partners. Essential Duties & Responsibilities: Operational Leadership & Execution Day-to-Day Delivery: Manage the day-to-day deliverables of the team, ensuring tasks align with the workstreams translated from IT leadership strategy. Hands-on Contributions: When necessary, assist in service delivery and participate in on-call rotation to ensure critical services are available Agile Management: Hold daily standup cadence meetings with the team and plan/prioritize work across the team to maintain high velocity. Strategic Alignment: Partner with senior leadership to ensure technical work is aligned with business priorities and participate in quarterly project planning. People Management: Lead the performance and development of systems and SRE staff, defining skill competency models and gap closure plans. Customer Engagement & Crisis Command Front-Line Communication: Act as the primary point of contact for internal customers, clients, and users to understand needs and provide transparent technical updates. Incident Command: Take the lead on organizing the team to resolve critical issues, including formal escalation management and standing up "war rooms" for P1/P2 items. Root Cause Advocacy: Participate in root cause analysis after incidents to ensure permanent resolution and communicate findings to affected business stakeholders. Infrastructure & SRE Functions Infrastructure Engineering: Build, operate, and maintain scalable, highly available, and resilient infrastructure. Automation Vision: Develop a vision and strategy to automate common SRE requests. Service Monitoring: Analyze metrics to create actionable monitors or alerts to ensure critical services are available and performing well. Modernization: Lead the adoption, implementation, and lifecycle of the container platform and associated CI/CD pipelines. Core Systems: Plan, implement, and lifecycle Cambria's compute, storage, virtualization, and server operating systems. Reliability: Work with product teams to ensure SLO targets are met or exceeded and participate in the on-call rotation. Qualifications & Skills: Motivated self-learner pushing technology solutions forward who anticipates problems and challenges Extensive experience with container orchestration (Kubernetes, Nutanix EKS, AWS EKS) and modern CI/CD practices Experience with implementing and maintaining virtualization platforms such as VMware/AHV Solid foundation in Linux administration and troubleshooting Experience writing and maintaining infrastructure as code Experience working with Agile delivery methodologies such as Scrum and Kanban. Familiarity and experience with ALM toolsets (Cambria uses Jira) and collaboration software (such as Slack, G-Suite, and Confluence). Strong leadership and management skills. Excellent communication and interpersonal skills. Strong business acumen; can easily articulate complex ideas clearly to all levels of leadership The ability pivot and drive change in an ever changing environment Excellent time management and organizational skills Proven track record of delivering results Strong motivational, influential and organized leadership skills to lead the team to accomplish goals Strong analytical and problem-solving skills. Ability to thrive in a fast-paced, dynamic environment. Minimum Requirements: Education: Bachelor's degree in Computer Science, Electrical Engineering, or related technical degree or equivalent experience. Experience: 10+ years working in a Systems Administrator/Systems Engineering position Systems: Experience with some or all of the following software/tools or close equivalents: VMware Nutanix NX Appliances Nutanix AHV Nutanix EKS AWS EKS Ansible Backup Software (Cohesity) Pure Storage Windows Server Linux (Red Hat Enterprise Linux) Active Directory Okta AWS Terraform Git Red Hat Satellite Red Hat Identity Management Additional Requirements: Managerial Requirements: Yes Physical Requirements: Office position requiring telephone and computer use. Able to lift up to 20 pounds occasionally / sparingly (office supplies, packages, samples). May need to bend, reach, walk, or stoop occasionally. Prolonged periods sitting and/or standing at a desk and working on a computer. Travel Requirements: Travel between local C