Production Engineering Lead

External

Veeamsoftware · Pune, India

Full-timeOn-site1w ago

AWSAzureDocumentationGCPGrafanaIncident Response

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We're building a new global Production Engineering (Prod Eng) function to serve as the bridge between Technical Support and Engineering. As Supervisor, you will lead your regional team to reduce the time it takes to resolve customer-impacting issues, empowering Technical Support with better escalation pathways, and keep SRE focused on strategic reliability work. Your team will be part of a 24×7 follow-the-sun model, running operations for Veeam Data Cloud - handling day-to-day tasks, troubleshooting problems, identifying incidents, and owning escalations from Technical Support. You will work in conjunction with our SRE and Product Development teams - turning insights into systemic improvements, making our products more resilient while improving customer satisfaction at scale.

Responsibilities

Regional Leadership & Team Enablement
Lead your region's Production Engineering team's day-to-day execution, including shift planning, priority setting, and workload balancing.
Coach and support engineers through investigations, ticket handling, incident response, and customer-impacting escalations.
Partner with the Production Engineering Manager on hiring engineers, onboarding, performance feedback, and career development for the region.
Ensure sustainable operations: fair scheduling, recovery time, and proactive burnout prevention.
Escalations & Incident Partnership
Own regional coverage for escalations from Technical Support, ensuring timely triage, communication, and resolution.
Coordinate mitigations and fixes with Product Engineering, SRE, and Support - maintaining a clear single-threaded owner per escalation.
Drive high-quality customer-impacting issue updates: crisp status, impact, next steps, and expected timelines.
Identify and participate in proactive incident response during the regional daytime
Operational Excellence & Continuous Improvement
Maintain and improve runbooks, on-call playbooks, and escalation pathways between Support ↔ Prod Eng ↔ SRE ↔ Product Engineering.
Identify recurring issues and systemic pain points; turn them into actionable engineering work (bugs, reliability improvements, automation, documentation).
Reduce handle times and operational friction by improving tooling and process workflows (e.g., SNOW ↔ Jira handoffs, ownership clarity, and "ticket → code → docs" closure).
Observability & Diagnostic Readiness
Ensure services and operational workflows are "debuggable by default" through strong logging, metrics, tracing, and alert hygiene.
Partner with SRE and platform teams to improve telemetry standards, dashboards, and escalation signals.
Contribute to creating and maintaining a reliable escalation experience for Support and field teams (e.g., known issues, standard diagnostics, and common mitigations).

Requirements

Required
6+ years of experience in an operations, systems engineering, or technical support role
Prior experience leading a small team or serving as a senior/primary escalation point.
Experience triaging and coordinating resolution for user-impacting issues.
Experience with ticketing systems (eg ServiceNow, Jira) and building clean handoff processes.
Strong troubleshooting skills across various integrated applications and systems.
Comfortability working across functions (Support, Engineering, SRE, Product) and across time zones.
Clear written and verbal communication, especially during escalations and incidents.
Preferred
Working knowledge of incident response practices (triage, mitigation, communication, post-incident follow-up).
Prior experience with Veeam backup software, Microsoft 365, Salesforce,
Experience with public cloud environments (Azure preferred; AWS/GCP also valued).
Familiarity with observability tooling and practices (e.g., OpenTelemetry, Prometheus, Grafana, logging platforms).
Experience improving operational workflows and reducing toil through automation
Why Join Us
Define and lead a new global Production Engineering function from inception.
Drive measurable improvements in handle times, escalation quality, and customer satisfaction.
Collaborate closely with SRE and Engineering leadership to build resilient, scalable systems.
Work in a global environment with competitive compensation and opportunities for growth.
What You'll Get
18 paid vacation day

Benefits

Paid time off

Additional Information

Veeam is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale. As the market leader in both data resilience and data security posture management, Veeam is built for the convergence of identity, data, security, and AI risk. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 550,000 customers worldwide, who trust Veeam to keep their businesses running. Join us as we go fearlessly forward together, growing, learning, and making a real impact for some of the world's biggest brands.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at veeamsoftware? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect