Principal Site Reliability Engineer - SaaS
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Lead the design, implementation, and management of systems and solutions to improve system reliability and scalability
- Automate operational tasks and processes to improve efficiency, reduce manual intervention, and enhance system performance
- Monitor system performance and availability, proactively identifying and resolving issues before they impact users or services
- Collaborate with development teams to integrate reliability-focused practices into the software development lifecycle
- Conduct root cause analysis for incidents and outages, implementing long-term fixes and preventive measures
- Develop and maintain infrastructure monitoring, alerting, and logging systems to ensure visibility and proactive management of systems
- Ensure the security, performance, and scalability of cloud-based infrastructure, following best practices for configuration management and deployment
- Provide mentorship and guidance to junior engineers, promoting best practices and knowledge-sharing within the team
- Engage with cross-functional teams to ensure reliability and performance are prioritized across the organization's systems and services
- Type of Work This Role Could Engage With
- Leading efforts to design and implement solutions to enhance the reliability, scalability, and availability of critical systems
- Collaborating with development teams to ensure that systems and services are built with reliability in mind
- Automating monitoring, incident management, and infrastructure provisioning to streamline operations and enhance efficiency
- Analyzing data from system performance metrics to identify trends, bottlenecks, and areas for improvement
- Participating in incident response and on-call rotations, ensuring rapid resolution of critical issues
- Building and managing infrastructure as code (IaC) to ensure consistent and scalable system deployment
- Engaging in capacity planning and performance tuning to meet business requirements and optimize system performance
- Continuously evaluating and incorporating emerging technologies, tools, and practices to improve system reliability
- WHAT WE VALUE
- Site Reliability Engineering Expertise : deep knowledge of SRE principles, practices, and tools to maintain and improve system reliability
- Cloud Infrastructure Proficiency : hands-on experience with cloud platforms (AWS, Azure, GCP) and infrastructure-as-code tools
- Automation & Scripting : strong ability to automate operational tasks, reducing manual toil and improving system efficiency
- Monitoring & Observability : experience building and maintaining robust monitoring, alerting, and logging systems
- Incident Management & Root Cause Analysis : proven ability to respond to incidents rapidly and implement lasting preventive measures
- Collaboration & Communication : ability to work effectively across development, operations, and IT teams to drive reliability outcomes
- Mentorship & Leadership : commitment to growing junior team members and fostering a culture of engineering excellence
- Analytical Mindset : data-driven approach to performance tuning, capacity planning, and continuous improvement
Benefits
Additional Information
Please note: This role is open to Saudi nationals only Not a Saudi national ? We have many exciting opportunities across our organization we encourage you to explore other roles at SimCorp. WHAT MAKES US, US Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology. If you are an innovative, curious, collaborative person who embraces challenges and wants to grow, learn and pursue outcomes with our prestigious financial clients, say Hello to SimCorp! At its foundation, SimCorp is guided by our values - caring, customer success-driven, collaborative, curious, and courageous. Our people-centered organization focuses on skills development, relationship building, and client success. We take pride in cultivating an environment where all team members can grow, feel heard, valued, and empowered. If you like what we're saying, keep reading! WHY THIS ROLE IS IMPORTANT TO US The Principal Site Reliability Engineer at SimCorp, at the IC6 grade, is responsible for ensuring the availability, reliability, and performance of mission-critical systems and services. With typically 5-8 years of experience in site reliability engineering, cloud infrastructure, or a related field, this role works independently to design and implement robust solutions, automate processes, and proactively address system issues. The Principal Site Reliability Engineer collaborates closely with development, operations, and IT teams to ensure high-quality service delivery and operational excellence. WHAT YOU WILL BE RESPONSIBLE FOR
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at simcorp? Share your experience