Specialist, AI Site Reliability Engineer - Core Enterprise Services

External

Charles Schwab · Hyderabad, India

Full-timeOn-site4d ago

Capacity PlanningForecastingIncident ResponseLoad TestingObservability

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

High Availability, Resilience & Pega Platform Stability
Design and implement architectures that achieve and sustain 99.999% uptime across critical systems and support the reliability of Pega Infinity production applications
Define, measure, and track SLOs, SLIs, error budgets, and operational health metrics for both platform services and Pega applications
Build AI-powered self-healing systems with automated failover, redundancy, and graceful degradation
Perform AI-assisted capacity planning, demand forecasting, and load testing, including performance monitoring for Pega workloads
Monitor and support Pega platform components such as queues, job schedulers, agents, node types, search indexing, and background processing to maintain application stability
Production Support & Incident Management
Provide production support for Pega Infinity applications, including incident triage, troubleshooting, service restoration, and coordination during major incidents
Monitor application health, background processing, integrations, APIs, and batch performance to proactively identify and resolve issues before they impact users
Analyze failures across Pega case processing, data pages, connectors, services, decisioning flows, and asynchronous processing
Partner with development, infrastructure, database, and business teams to diagnose production defects, implement fixes, and drive preventive actions
Support application releases, hotfix deployments, patch validation, rollback planning, and operational readiness for Pega Infinity platforms
Observability & Monitoring
Design, build, and maintain AI-enhanced observability platforms covering metrics, logs, traces, and intelligent alerting
Implement AI-powered anomaly detection, predictive alerting, and proactive system health management
Leverage Gen AI to auto-generate and refine dashboards, alert rules, runbooks, and support playbooks
Build real-time availability dashboards with AI-driven trend analysis tracking reliability and support targets
Use Pega diagnostic tools, alerts, and performance data to identify bottlenecks, failed processing, and platform health issues
Root Cause Analysis & Continuous Improvement
Build AI-accelerated root cause analysis with thorough postmortems and actionable remediation
Build AI-powered diagnostic tools that automatically correlate logs, metrics, and traces
Use Gen AI to analyze incident patterns, predict recurring failures, and recommend preventive actions
Continuously reduce MTTD and MTTR through AI-assisted workflows, operational automation, and platform improvements
Drive corrective actions related to guardra

Additional Information

Your opportunity At Charles Schwab, our purpose is simple: we champion clients' goals with passion and integrity. Guided by honesty, mutual respect and a commitment to doing what's right, we bring innovation, education, and service together to help shape financial futures. Our people are the foundation of our success - they approach their work with curiosity and collaboration, coming together to create solutions that make a meaningful impact for clients and communities. As we expand into India, we are bringing this same culture of inclusion, learning, and opportunity to new talent. Joining us means becoming part of a global team where your work matters and your future can take shape. Our Hyderabad location is central to Schwab's growth, bringing together talented people and technology to drive innovation, scale, and efficiency. Here, you will work alongside teams who create solutions that support millions of clients every day. The work you do is more than daily operations - it's a chance to experiment, learn, and build within a value-driven, supportive environment. This is a unique opportunity to be part of our early growth phase and shape something new, backed by the stability and strength of a Fortune 500 company. Your impact begins on day one, and your contributions will help define our future in the region We are seeking an AI Site Reliability Engineer to join a forward-thinking engineering team that builds intelligent observability, monitoring, and deployment automation solutions using AI-augmented development practices while also providing production support for Pega Infinity applications. This role combines hands-on software engineering with operational ownership, requiring the ability to build software solutions for operational challenges, leverage Gen AI to accelerate workflows, automate incident response, and support the stability, availability, and performance of business-critical Pega platforms. You will contribute to architecture, code, AI-driven tooling, and production readiness while partnering closely with development, platform, and business teams to resolve incidents, drive root cause analysis, and implement preventive improvements. Ideal for engineers who are curious, adaptable, and excited about working at the intersection of software engineering, AI, and enterprise application support.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Charles Schwab? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect