Engineer II - Site Reliability (Hybrid, IND)

External

Crowdstrike · India

Full-timeOn-siteToday

AnsibleArgoCDAWSBashCapacity PlanningDocumentation

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Operate Temporal infrastructure in production - deploy updates, monitor cluster health, respond to alerts, and maintain availability across multiple environments using Helm, Kubernetes and FluxCD
Automate operational work - write scripts and workflows that make deployments, upgrades, scaling operations, and troubleshooting repeatable and safe; reduce manual toil over time
Support capacity planning and performance tuning - track resource utilization, identify bottlenecks, tune configuration for better performance and contribute to capacity forecasts under guidance
Build observability - instrument services with metrics and logs, improve dashboards, and refine alerting so the team can catch problems before they impact users
Contribute to on call rotation - participate in incident response, learn how to triage and escalate issues effectively, write runbooks that help the next person on-call
Learn GitOps workflows - work with FluxCD to manage infrastructure-as-code, submit pull requests for configuration changes, and understand how declarative deployment pipelines work
Troubleshoot operational issues - investigate deployment failures, connectivity problems, performance degradations, and work with teammates to determine root cause and preventive fixes
Partner with consuming teams - help internal engineers onboard to Temporal, answer questions, debug integration issues, and contribute to documentation that makes adoption easier
Grow your infrastructure skills - work with PostgreSQL, AWS/GCP, Kubernetes networking, Helm chart management, certificate rotation, secret management and distributed systems operations under mentorship

Requirements

3+ years in DevOps, SRE, platform engineering or infrastructure roles - you've worked on production systems and understand the basics of running services reliably
Kubernetes fundamentals - you've deployed services to Kubernetes, understand pods/deployments/services, and can debug basic cluster issues; you don't need deep expertise but should be comfortable navigating kubectl and reviewing YAML
Helm experience - you've used Helm to deploy applications, understand charts and values files, and can troubleshoot failed releases
Some infrastructure-as-code experience - you've used tools like Terraform, Ansible, or GitOps workflows (FluxCD, ArgoCD) to manage infrastructure declaratively rather than clicking in consoles
Cloud platform exposure - you've worked with AWS or GCP in some capacity; you understand basic compute, networking, and storage primitives but don't need to be an expert
Scripting ability - you can write scripts (Bash, Python, Go) to automate repetitive tasks and build simple tooling
Basic understanding of stateful systems - you've worked with databases (PostgreSQL preferred) or other persistent services and understand backups, schema management, and connection handling at a foundational level
Willingness to learn and ask for help - you're comfortable saying "I don't know" and diving into unfamiliar territory with support from teammates
What Success Looks Like:
In your first few months:
You can deploy Temporal upgrades across environments with confidence
You've aut

Benefits

Health insurance

Additional Information

As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn't changed - we're here to stop breaches, and we've redefined modern security with the world's most advanced AI-native platform. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We're also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We're always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you. About This Role: CrowdStrike's engineering organization depends on shared infrastructure platforms that power critical product capabilities. The Temporal Platform team owns a production workflow orchestration system that serves engineering teams across the organization. You'll help operate and evolve our internal Temporal infrastructure, a stateful, distributed system running on Kubernetes across multiple regions. The work spans day to day operations, automation, performance tuning and capacity planning. You'll learn how to run complex infrastructure at scale while working alongside experienced platform engineers who will help you grow into broader ownership over time. This is a growth oriented role. We're looking for someone early in their platform engineering journey who's ready to build operational depth, develop automation skills and understand what it takes to run production infrastructure that teams depend on.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at CrowdStrike? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect