Principal Site Reliability Engineer - SaaS

External

Simcorp · Riyadh, Saudi Arabia

Full-timeRemoteToday

AWSAzureCapacity PlanningGCPIncident ResponseLeadership

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Lead the design, implementation, and management of systems and solutions to improve system reliability and scalability
Automate operational tasks and processes to improve efficiency, reduce manual intervention, and enhance system performance
Monitor system performance and availability, proactively identifying and resolving issues before they impact users or services
Collaborate with development teams to integrate reliability-focused practices into the software development lifecycle
Conduct root cause analysis for incidents and outages, implementing long-term fixes and preventive measures
Develop and maintain infrastructure monitoring, alerting, and logging systems to ensure visibility and proactive management of systems
Ensure the security, performance, and scalability of cloud-based infrastructure, following best practices for configuration management and deployment
Provide mentorship and guidance to junior engineers, promoting best practices and knowledge-sharing within the team
Engage with cross-functional teams to ensure reliability and performance are prioritized across the organization's systems and services
Type of Work This Role Could Engage With
Leading efforts to design and implement solutions to enhance the reliability, scalability, and availability of critical systems
Collaborating with development teams to ensure that systems and services are built with reliability in mind
Automating monitoring, incident management, and infrastructure provisioning to streamline operations and enhance efficiency
Analyzing data from system performance metrics to identify trends, bottlenecks, and areas for improvement
Participating in incident response and on-call rotations, ensuring rapid resolution of critical issues
Building and managing infrastructure as code (IaC) to ensure consistent and scalable system deployment
Engaging in capacity planning and performance tuning to meet business requirements and optimize system performance
Continuously evaluating and incorporating emerging technologies, tools, and practices to improve system reliability
WHAT WE VALUE
Site Reliability Engineering Expertise : deep knowledge of SRE principles, practices, and tools to maintain and improve system reliability
Cloud Infrastructure Proficiency : hands-on experience with cloud platforms (AWS, Azure, GCP) and infrastructure-as-code tools
Automation & Scripting : strong ability to automate operational tasks, reducing manual toil and improving system efficiency
Monitoring & Observability : experience building and maintaining robust monitoring, alerting, and logging systems
Incident Management & Root Cause Analysis : proven ability to respond to incidents rapidly and implement lasting preventive measures
Collaboration & Communication : ability to work effectively across development, operations, and IT teams to drive reliability outcomes
Mentorship & Leadership : commitment to growing junior team members and fostering a culture of engineering excellence
Analytical Mindset : data-driven approach to performance tuning, capacity planning, and continuous improvement

Benefits

Vision insurance

Additional Information

Please note: This role is open to Saudi nationals only Not a Saudi national ? We have many exciting opportunities across our organization we encourage you to explore other roles at SimCorp. WHAT MAKES US, US Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology. If you are an innovative, curious, collaborative person who embraces challenges and wants to grow, learn and pursue outcomes with our prestigious financial clients, say Hello to SimCorp! At its foundation, SimCorp is guided by our values - caring, customer success-driven, collaborative, curious, and courageous. Our people-centered organization focuses on skills development, relationship building, and client success. We take pride in cultivating an environment where all team members can grow, feel heard, valued, and empowered. If you like what we're saying, keep reading! WHY THIS ROLE IS IMPORTANT TO US The Principal Site Reliability Engineer at SimCorp, at the IC6 grade, is responsible for ensuring the availability, reliability, and performance of mission-critical systems and services. With typically 5-8 years of experience in site reliability engineering, cloud infrastructure, or a related field, this role works independently to design and implement robust solutions, automate processes, and proactively address system issues. The Principal Site Reliability Engineer collaborates closely with development, operations, and IT teams to ensure high-quality service delivery and operational excellence. WHAT YOU WILL BE RESPONSIBLE FOR

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at simcorp? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect