Senior Reliability Engineer - Engineering & Reliability @ ING Bank

External

Ing · Expo Business Park

Full-timeHybrid2w ago

AzureBudget ManagementCapacity PlanningChaos EngineeringCI/CDDNS

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

The Engineering & Reliability team in ING Romania is part of the global Engineering & Reliability organization, bringing together capabilities across Developer Experience, Site Reliability Engineering (SRE), and IT Resilience. The team plays a key role in rolling out and maturing SRE practices within ING Romania, with a strong focus on improving the reliability of Critical Business Services and driving end-to-end resilience across complex, distributed systems. Your day-to-day The initial focus will be to develop, innovate, mature & implement reliability engineering practices focused on resilience by design and systemic failure prevention. Establishes and enforces reliability patterns, validates implementations through engineering controls and ensures integration within CI/CD and runtime environments. Works closely with Enterprise Architecture, DevOps teams and Global Engineering & Reliability to ensure alignment with resilience patterns applied on monoliths, microservices, pipelines, databases, etc. Contribute together with SRE team to identify and eliminate systemic reliability weaknesses ensuring end-to-end visibility, failure isolation and recovery mechanisms. Focus on resilience and reliability engineering of critical business services, including dependency mapping, failure mode identification, resilience pattern validation and observability correlation. Mature resilience engineering and failure testing practices (chaos engineering, fault injection). Mature error budget management as a primary decision-making mechanism. Drive toil reduction programs by identifying repetitive manual work and automating tasks. Mature major incident and problem management with blameless post-mortems. Mature reliability lifecycle practices covering prevent, detect, respond and improve cycles across the full SDLC. Mature observability as an actionable system enabling early detection and diagnosis. Mature capacity planning and performance engineering aligned with system growth. Drive adoption of reliability testing, continuous validation and well-architected reviews. Act as a key enabler of reliability for Digital Products by driving adoption of the One Engineering System (1ES). What you bring to the team Bachelor's or master's degree in computer science, information systems or a related discipline. Relevant experience / SRE / platform engineering and/or software architect roles, with strong experience in distributed systems and resilience design Deep understanding of distributed systems, failure patterns and cloud-native architectures. Containers and orchestration platforms (Docker, Kubernetes). Microservices architectures (service mesh, API-driven systems, eventual consistency). CI/CD pipelines and automation frameworks (Azure DevOps, GitOps, pipeline-as-code). Networking fundamentals (TCP/IP, DNS, load balancing, service discovery). Databases (SQL/NoSQL, replication, consistency models). Software development (Java, Kotlin, Python or similar). Advanced observability practices. Strong experience integrating AI/ML capabilities in operations (AIOps, anomaly detection, predictive alerting, automation). To

Additional Information

Mission The SRE team is responsible to roll-out the SRE practices to improve the reliability of ING's critical services on Romania. The Reliability Authority role extends these practices by focusing on deep engineering, systemic resilience and end-to-end reliability of Critical Business Services. Develops, innovates, matures & implements advanced SRE reliability practices and related IT processes across ING, in cooperation with the Global Engineering & Reliability teams. Integrates the standards for resilience engineering, ensuring systems are built to withstand and recover from failure. Ensures proper documentation, training material and knowledge transfer across engineering teams within ING Romania. Contributes as a reliability expert to key operational activities with a focus on services or incidents touching multiple areas; this includes performing critical business service / critical chain reviews to identify weaknesses, conducting deep failure-mode analysis, supporting global major incidents as technical authority, ensuring high-quality systemic root-cause analysis and structural follow-up across Tech domains, while continuously driving engineering principles such as toil reduction, automation-first mindset and reliability improvement cycles. As a Senior Reliability Engineer , you will enable the reliability and scalability of ING's Digital Products by leveraging the One Engineering System (1ES)-the bank's standardized engineering platform. You will ensure that products are built using consistent tooling, golden paths, and integrated observability and delivery pipelines, embedding reliability, security, and quality by design. Through 1ES adoption, you will drive improved availability, faster recovery, and reduced operational risk while empowering teams to deliver changes safely and efficiently at scale.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at ing? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect