Staff Infrastructure Engineer - Observability
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
SentinelOne is a company at the intersection of AI and security, pioneering a new operating model for cybersecurity. Our AI-native platform unifies protection across endpoint, cloud, identity, data, and AI systems to deliver autonomous detection and response with clarity and speed. By combining real-time analytics, intelligent automation, and a unified data foundation, we reduce noise, simplify complexity, and empower security teams to focus on what truly matters. Our teams are builders, problem-solvers, and innovators committed to shaping the future of security. If you are excited to solve hard problems alongside talented, mission-driven people, we invite you to help us build a safer future for humanity. What Are We Looking For? We're looking for people who are relentlessly curious and committed to continuous learning. AI is reshaping every function across our business, and we enable every team member, regardless of role or level, to build fluency in AI tools and concepts. Those who thrive here actively seek out new solutions, experiment thoughtfully, and apply what they learn to drive better, faster, smarter outcomes. As a Staff Infrastructure Engineer, you'll be a pivotal technical leader and architect within our Observability team, driving strategic initiatives and shaping the future of our critical systems. You will leverage your deep expertise to design, implement, and optimize solutions that underpin SentinelOne's global platform, directly empowering engineering teams across the organization. We are seeking a candidate who is driven by a deep passion for observability and technical leadership. Imagine architecting the core systems that provide SentinelOne with real-time, global visibility, delivering actionable platform insights precisely when they are needed. In this high-impact role, you'll design and implement robust, secure solutions for high-volume data ingestion, storage, and analysis-fundamentally shaping how we understand and optimize our platform health. This is your chance to take end-to-end ownership of critical infrastructure, mentor talented engineers, and profoundly accelerate software delivery across our entire engineering organization. Due to Federal Government contract requirements, U.S. Citizenship is required for this position. FedRAMP staff may be subject to customer or third party background checks up to and including Secret Clearance if required by their role at SentinelOne. What Will You Do? Primary responsibilities include: Architect and implement robust, scalable telemetry platforms that empower SentinelOne engineers to deploy and monitor features with speed, safety, and reliability. Act as the primary Subject Matter Expert (SME) and administrator for our core observability stack, including Grafana, Prometheus, Thanos/Mimir/Cortex, and OpenTelemetry (OTEL) pipelines. Partner strategically with diverse engineering teams across the organization to define platform requirements, ensuring the observability ecosystem evolves ahead of stakeholder needs. Take complete ownership of critical features, from initial architectural design and requirements refinement through to production deployment and operational maturity. Drive exemplary operational efficiency for critical observability services across AWS and GCP, meticulously balancing unwavering system reliability with smart cloud cost-optimization. Build robust automation and self-service tooling to drastically reduce operational toil, optimize resource utilization, and minimize pager fatigue. Drive the deployment, maintenance, and compliance of observability systems in critical, high-security environments, including FedRAMP and air-gapped deployments . Cultivate platform transparency and reliability by rigorously implementing IaC (Terraform/Ansible) and standardizing industry best practices. Elevate engineering quality by mentoring team members, leading comprehensive technical design and code reviews, and providing constructive feedback that fosters growth. Lead the swift resolution of highly complex production incidents, perform thorough root-cause analyses, and participate in on-call rotations to ensure peak system integrity. What Skills and Knowledge Should You Bring? Ideal candidates will have 8+ years experience in Infrastructure Engineering, Site Reliability Engineering (SRE), or a related systems-focused field. 8+ years experience in architecting, scaling, and managing enterprise-grade obs
Benefits
Additional Information
Our Purpose At SentinelOne, we are driven by a clear purpose: to give the advantage to those who secure our future. As AI reshapes how organizations build, operate, and innovate, the responsibility to protect them becomes more critical than ever. When you join SentinelOne, your work helps protect global enterprises, critical infrastructure, and the technologies shaping tomorrow. If you are motivated by meaningful challenges and want your impact to be real, measurable, and global, you will find purpose here.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at sentinellabs? Share your experience