Specialist, AI Site Reliability Engineer - Core Enterprise Services
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- High Availability, Resilience & Pega Platform Stability
- Design and implement architectures that achieve and sustain 99.999% uptime across critical systems and support the reliability of Pega Infinity production applications
- Define, measure, and track SLOs, SLIs, error budgets, and operational health metrics for both platform services and Pega applications
- Build AI-powered self-healing systems with automated failover, redundancy, and graceful degradation
- Perform AI-assisted capacity planning, demand forecasting, and load testing, including performance monitoring for Pega workloads
- Monitor and support Pega platform components such as queues, job schedulers, agents, node types, search indexing, and background processing to maintain application stability
- Production Support & Incident Management
- Provide production support for Pega Infinity applications, including incident triage, troubleshooting, service restoration, and coordination during major incidents
- Monitor application health, background processing, integrations, APIs, and batch performance to proactively identify and resolve issues before they impact users
- Analyze failures across Pega case processing, data pages, connectors, services, decisioning flows, and asynchronous processing
- Partner with development, infrastructure, database, and business teams to diagnose production defects, implement fixes, and drive preventive actions
- Support application releases, hotfix deployments, patch validation, rollback planning, and operational readiness for Pega Infinity platforms
- Observability & Monitoring
- Design, build, and maintain AI-enhanced observability platforms covering metrics, logs, traces, and intelligent alerting
- Implement AI-powered anomaly detection, predictive alerting, and proactive system health management
- Leverage Gen AI to auto-generate and refine dashboards, alert rules, runbooks, and support playbooks
- Build real-time availability dashboards with AI-driven trend analysis tracking reliability and support targets
- Use Pega diagnostic tools, alerts, and performance data to identify bottlenecks, failed processing, and platform health issues
- Root Cause Analysis & Continuous Improvement
- Build AI-accelerated root cause analysis with thorough postmortems and actionable remediation
- Build AI-powered diagnostic tools that automatically correlate logs, metrics, and traces
- Use Gen AI to analyze incident patterns, predict recurring failures, and recommend preventive actions
- Continuously reduce MTTD and MTTR through AI-assisted workflows, operational automation, and platform improvements
- Drive corrective actions related to guardra
Additional Information
Your opportunity At Charles Schwab, our purpose is simple: we champion clients' goals with passion and integrity. Guided by honesty, mutual respect and a commitment to doing what's right, we bring innovation, education, and service together to help shape financial futures. Our people are the foundation of our success - they approach their work with curiosity and collaboration, coming together to create solutions that make a meaningful impact for clients and communities. As we expand into India, we are bringing this same culture of inclusion, learning, and opportunity to new talent. Joining us means becoming part of a global team where your work matters and your future can take shape. Our Hyderabad location is central to Schwab's growth, bringing together talented people and technology to drive innovation, scale, and efficiency. Here, you will work alongside teams who create solutions that support millions of clients every day. The work you do is more than daily operations - it's a chance to experiment, learn, and build within a value-driven, supportive environment. This is a unique opportunity to be part of our early growth phase and shape something new, backed by the stability and strength of a Fortune 500 company. Your impact begins on day one, and your contributions will help define our future in the region We are seeking an AI Site Reliability Engineer to join a forward-thinking engineering team that builds intelligent observability, monitoring, and deployment automation solutions using AI-augmented development practices while also providing production support for Pega Infinity applications. This role combines hands-on software engineering with operational ownership, requiring the ability to build software solutions for operational challenges, leverage Gen AI to accelerate workflows, automate incident response, and support the stability, availability, and performance of business-critical Pega platforms. You will contribute to architecture, code, AI-driven tooling, and production readiness while partnering closely with development, platform, and business teams to resolve incidents, drive root cause analysis, and implement preventive improvements. Ideal for engineers who are curious, adaptable, and excited about working at the intersection of software engineering, AI, and enterprise application support.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Charles Schwab? Share your experience