Senior Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Job Description: Senior Site Reliability Engineer, IC4 WHAT MAKES US, US Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology. If you are an innovative, curious, collaborative person who embraces challenges and wants to grow, learn and pursue outcomes with our prestigious financial clients, say Hello to SimCorp! At its foundation, SimCorp is guided by our values - caring, customer success-driven, collaborative, curious, and courageous. Our people-centered organization focuses on skills development, relationship building, and client success. We take pride in cultivating an environment where all team members can grow, feel heard, valued, and empowered. If you like what we're saying, keep reading! WHY THIS ROLE IS IMPORTANT TO US As a Senior Site Reliability Engineer, you will be working on Cloud Native Products & Services, taking ownership of various responsibility domains like monitoring, observability, release management, vulnerability management, cost management, audit & compliance etc. You will work closely with DevOps engineers, clients, and stakeholders to ensure reliability, performance, and automation for both existing and new cloud native products & services. Onboard and long-running clients on them. Your contributions will drive stability, continuous improvement, and operational excellence in our Azure-based environments. This role blends hands-on engineering, incident response, platform configuration, and service quality, - guided by ITIL and SRE best practices. WHAT YOU WILL BE RESPONSIBLE FOR Support the operational and enhancement of mission-critical environments for both new and existing Cloud Native products & services Collaborate with product development teams to enhance monitoring, observability, reliability, and performance of these services. Collaborate deeply across engineering teams to understand systems at the code level. Manage & improve our infrastructure deployment pipelines and troubleshoot onboarding and operational issues Drive capacity planning efforts to ensure our platform is resilient and scalable as we grow. Build tools and automation to eliminate manual TOIL, improve engineering velocity, developer experience, and improve system reliability. Define and manage SLOs and error budgets in partnership with Engineering teams. Contribute to incidents, problems, and change management processes. Execute disaster recovery, configuration management, and platform readiness tasks. Flexible working in regular & evening shift on rotational basis and providing weekend or On-Call support as needed. Collaborate with Agile teams and take part in design discussions with clients, vendors, and stakeholders. Contribute to knowledge sharing across multiple Product Areas. Leverage a strong foundation in ITIL practices, including problem, change, and incident management. WHAT WE VALUE Bachelor's degree in Computer Science or related field (Master's is a plus) 3+ years in Site Reliability, DevOps, or Cloud Engineering roles Must have expertise with Microsoft Azure Cloud. Expertise in Infrastructure as Code (IaC) using Bicep, ARM and Terraform Solid experience in monitoring and logging tools (Azure Monitor, Application Insights, DataDog, Log Analytics). Hand-on experience in IdP Onboarding and integrating, configuring IdP solutions like Azure Entra ID, Okta, KeyCloak or PingFederate Experience in centralizing authentication, managing user identities, and implementing secure access protocols (SAML, OAuth, OIDC) Experience working with observability frameworks like Open Telemetry and distributed tracing systems Experience working with application reliability platforms like Checkly or equivalent Experience setting up synthetic monitoring using Playwright or equivalent Knowledge of AI/ML-based anomaly detection, log aggregation and analysis tools like Microsoft Azure Anomaly Detector or equivalent. Experience working with Microsoft Defender Suite (EDR, XDR) and Sentinel. Proficient in KQL for threat hunting and improving compliance scores using Defender for Cloud. Able to identify and remediate vulnerabilities Understanding of networking, containerization (Kubernetes, Docker) Good understanding of APIs, scripting languages like PowerShell, Bash, Kusto and databases like SQL, Cosmos DB and Postgres SQL Familiarity with SimCorp Dimension & Sales force is a plus Proficiency in IT service management (ITSM) frameworks like ITIL, focusing on incident, change, and problem management to improve operational efficiency Experience managing both onboarding projects and live production operations Collaborative mindset and ability to work in cross-functional teams Interest in continuous learning and growth within your Product Area
How well this role fits your profile.
Worked at simcorp? Share your experience