Principal Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Job Description: Principal Site Reliability Engineer WHAT MAKES US, US Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology. If you are an innovative, curious, collaborative person who embraces challenges and wants to grow, learn, and pursue outcomes with our prestigious financial clients, say Hello to SimCorp! At its foundation, SimCorp is guided by our values - caring, customer success-driven, collaborative, curious, and courageous. Our people-centered organization focuses on skills development, relationship building, and client success. We take pride in cultivating an environment where all team members can grow, feel heard, valued, and empowered. If you like what we're saying, keep reading! WHY THIS ROLE IS IMPORTANT TO US As a Principal Site Reliability Engineer, you will act as a technical authority across one or more Product Areas, guiding the direction of our Azure-based SaaS platform and Site Reliability strategy. You will lead complex initiatives related to platform reliability, observability, onboarding automation, and incident response across product boundaries, while advancing our overall cloud capability. This role goes beyond hands-on engineering. You will play a key architectural and leadership role in designing scalable systems, mentoring senior engineers, and shaping our Azure governance models. With deep expertise in SRE and ITIL practices, you will elevate platform resilience and client experience across both onboarding and ongoing operations. Your impact will be felt across departments, helping SimCorp become a fully cloud-native SaaS provider with high-performing, secure, and automated service delivery. WHAT YOU WILL BE RESPONSIBLE FOR Act as the technical lead on SRE initiatives across multiple Product Areas Drive forward our strategic use of Microsoft Azure in onboarding and site reliability disciplines Architect scalable, secure, and automated solutions for client onboarding and live operations Lead the design and evolution of cross-cutting platform capabilities (e.g., observability, CI/CD pipelines, IaC standards, DR frameworks) Shape and govern Azure implementation patterns to ensure platform standardization, reliability, and cost-efficiency Solve the most complex and business-critical reliability challenges involving distributed cloud systems Advise engineering leads and product owners on cloud platform decisions, including trade-offs and risk mitigation Collaborate with Information Security, Platform Engineering, and Architecture teams on compliance and cloud controls Guide the definition of SLOs, SLIs, and other reliability metrics across departments Lead root cause analysis, major incident postmortems, and reliability retrospectives across teams Provide thought leadership, mentoring, and coaching to senior and lead engineers Build communities of practice to strengthen SRE principles and knowledge sharing within the organization Represent the SRE function in executive-level planning, roadmap definition, and technical due diligence Contribute to SimCorp's overall transformation into a SaaS-first, cloud-native company WHAT WE VALUE Bachelor's or Master's degree in Computer Science, Engineering, or a related field 10+ years of experience in Site Reliability Engineering, Cloud Infrastructure, or Platform Architecture roles Extensive expertise in Microsoft Azure, including architecture, deployment, automation, and cost optimization Extensive knowledge of Windows Servers OS and Windows-based desktop application troubleshooting. Experience working with Citrix, AD, WAC/WSUS is a plus Extensive knowledge of enterprise System Operations and System Administration of complicated system landscapes Strong grasp of cloud-native and hybrid architectures, distributed systems, networking, and security Mastery in Infrastructure as Code (IaC) using Terraform, ARM, Bicep, and related tooling Deep knowledge of observability stacks (Azure Monitor, Log Analytics, Grafana, Application Insights) Experience leading complex incident and problem management efforts at scale Broad technical skillset including Kubernetes, Docker, CI/CD pipelines, SQL, APIs, and scripting Strong foundation in ITIL processes with a strategic mindset for operational excellence Proven ability to influence senior stakeholders, lead through ambiguity, and align engineering with business needs Experience working in or guiding teams within regulated, security-conscious environments (e.g., financial services) Demonstrated passion for mentorship, knowledge sharing, and building engineering culture Ability to think strategically while delivering pragmatic, hands-on solutions
How well this role fits your profile.
Worked at simcorp? Share your experience