Disaster Recovery Planning: Develop and maintain comprehensive disaster recovery plans, including risk assessments, continuity strategies, and response procedures.
Risk Assessment: Identify potential threats and vulnerabilities, conducting risk assessments to evaluate their impact on business operations.
Disaster Recovery Testing: Plan, execute, and evaluate regular disaster recovery exercises to validate the effectiveness of recovery plans and make necessary adjustments.
Coordination: Collaborate with IT, security, and other relevant departments to ensure alignment between disaster recovery and security strategies.
Documentation: Maintain accurate documentation of disaster recovery plans, procedures, and incident response protocols.
Incident Response: Lead disaster recovery efforts in the event of a disruption, coordinating the response and recovery activities.
Training and Awareness: Develop and provide training to employees on disaster recovery and business continuity procedures to enhance preparedness.
Continuous Improvement: Stay updated on industry best practices, emerging technologies, and evolving threats to continually improve disaster recovery capabilities.
On-Premises & Data Center Resilience
Oversee DR architecture for on-premises data centers including storage replication (NetApp SnapMirror, Pure Storage ActiveCluster), SAN/NAS failover, and bare-metal recovery.
Manage relationships with colocation and secondary data center providers to ensure contractual alignment with DR objectives.
Drive server virtualization recovery strategies using VMware Site Recovery Manager (SRM) and Veeam.
Cybersecurity & Ransomware Recovery
Design and maintain immutable backup architectures and air-gapped environments to protect against ransomware and destructive cyberattacks.
Lead cyber recovery exercises simulating ransomware scenarios; document and refine clean-room recovery playbooks.
Collaborate with the CISO and SOC to integrate DR procedures into the Incident Response (IR) lifecycle, ensuring seamless handoffs from containment to recovery.
Champion zero-trust recovery principles, including identity verification during failover and integrity validation of recovered workloads.
Compliance, Audit & Governance
Maintain DR program alignment with ISO 22301, NIST SP 800-34, SOC 2 Type II, and applicable industry regulations (HIPAA, PCI-DSS, GDPR as relevant).
Own all DR-related evidence gathering, documentation, and remediation activities for internal and external audits.
Report program health, test outcomes, and risk metrics to executive leadership and the Board on a regular cadence.
Establish governance frameworks including DR policy, standards, and exception management processes.
Testing, Drills & Continuous Improvement
Plan and execute full-scale DR tests (tabletop, functional, and full failover) across production-equivalent environments at least twice per year.
Track and drive closure of test findings; maintain a risk register for unresolved gaps.
Implement chaos engineering principles and game-day exercises to proactively uncover resilience weaknesses.
Requirements
10+ years in IT infrastructure, with at least 6 years focused on Disaster Recovery, Business Continuity, or Site Reliability Engineering.
Deep expertise designing and operating DR solutions on at least two major cloud platforms (AWS, Azure, or GCP) including cross-region replication, Route 53 / Traffic Manager / Cloud DNS failover, and managed database HA.
Extensive hands-on experience with on-premises DR technologies: VMware SRM, Veeam, Zerto, or equivalent.
Demonstrated experience building and executing ransomware recovery programs, including immutable storage and cyber recovery runbooks.
Strong command of compliance frameworks: ISO 22301, NIST SP 800-34 / CSF, SOC 2, and relevant sector regulations.
Proficiency in scripting and automation (Python, PowerShell, Bash) and
Benefits
Health insurance
Additional Information
About Northern Trust:
Northern Trust, a Fortune 500 company, is a globally recognized, award-winning financial institution that has been in continuous operation since 1889.
Northern Trust is proud to provide innovative financial services and guidance to the world's most successful individuals, families, and institutions by remaining true to our enduring principles of service, expertise, and integrity. With more than 130 years of financial experience and over 22,000 partners, we serve the world's most sophisticated clients using leading technology and exceptional service.
Job Summary:
The Lead, Resiliency Engineer will be responsible for designing, implementing, and managing our disaster recovery programs. Your strong technical background and infrastructure expertise will be critical in identifying potential risks and developing strategies to mitigate the impact of disasters. You will work collaboratively with cross-functional teams to ensure our business remains secure and resilient in the event of any unforeseen disruptions including cyber recovery.