Skip to main content
Back to jobs

Director Site Reliability Engineering

External
websteronline logoWebsteronline · CT Stamford Hq
Full-timeHybrid2w ago
AgileApplication SecurityAWSCapacity PlanningDocumentationGitLab
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Monitoring and Observability: Implement and maintain tools for monitoring, logging, and tracing to gain insights into system performance and health
  • Automation: Write software and scripts to automate repetitive tasks, such as deployment, monitoring, and system management. Advocate for and lead Automation wherever possible. Ensure environments are well-managed, structured appropriately, cost effective, and synchronized as much as possible.
  • Incident Management: Respond to incidents, troubleshoot system-level issues, and perform root cause analysis to prevent recurrence
  • Reliability Engineering: Design and build reliable and scalable systems, define Service Level Objectives (SLOs) and Indicators (SLIs), and implement reliability patterns
  • Collaboration: Work closely with software developers to ensure applications are reliable and to provide feedback on performance in a production environment
  • Documentation: Create and maintain documentation, including runbooks and system diagrams, to ensure knowledge sharing and team efficiency
  • Set a high bar for reliability and availability -- and meet the bar via automation relentless improvement.
  • Improve and sustain services through rigorous development, testing and release procedures.
  • Key player during deliberations on system design, platform management, and capacity planning.
  • Have a strong 'detective' mindset on why things don't work and be among the first to offer and work on solutions.
  • Be a 'link' between technologists and business stakeholders: able to have conversations with Line of Business (LoB) and technical Agile teams to work through challenges.
  • Partner with peers to advance the maturity of the DevOps practice including new/existing technologies, tools, processes, and standards. Clearly communicate expectations on technical direction and provide ongoing guidance.
  • Serve as a sounding board and technical advisor for your team in the analysis, design, and execution of solutions. Help your team anticipate unforeseen dependencies or gaps early in the SDLC.
  • From your domain's viewpoint, provide leadership and technical expertise to your Agile team to validate story points are sized appropriately, sprint plans are achievable, and releases are well-planned.
  • Skills and Abilities
  • Deep understanding of systems development life cycle, cloud-based systems, and application architecture.
  • Experience with multiple programming languages (Python, etc.), configuration management tools, and containers strongly preferred.
  • Familiarity with unit testing concepts and test automation frameworks (SpecFlow, SOAPUI), RESTful APIs and micro-services, WCF services, TSQL, SQL queries and stored procedures.
  • Advan

Benefits

Health insuranceVision insurance

Additional Information

If you're looking for a meaningful career, you'll find it here at Webster. Founded in 1935, our focus has always been to put people first--doing whatever we can to help individuals, families, businesses and our colleagues achieve their financial goals. As a leading commercial bank, we remain passionate about serving our clients and supporting our communities. Integrity, Collaboration, Accountability, Agility, Respect, Excellence are Webster's values, these set us apart as a bank and as an employer. Come join our team where you can expand your career potential, benefit from our robust development opportunities, and enjoy meaningful work! The Director of Site Reliability Engineer is a pivotal technical leader within the Software Engineering organization, tasked with transforming how reliability, performance, and availability are achieved across our platforms. This role goes beyond maintaining systems-it reimagines and modernizes operational practices through automation, cloud-native design, and API-driven integration.You will lead initiatives that elevate our AWS cloud architecture and MuleSoft integration ecosystem, ensuring they are secure, scalable, and resilient. By applying advanced software engineering principles and site reliability practices, you will drive a cultural and technical shift toward proactive reliability, continuous improvement, and innovation.This role requires visionary thinking, deep technical expertise in AWS and MuleSoft, and a passion for driving change that results in more reliable, efficient, and future-ready systems.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at websteronline? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect