Senior Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Investment Services is Morningstar's internal product group focused on building and maintaining the platforms that power our global data operations. We enable the Managed Investment Data (MID), Reference Entity Data (RED), Fixed Income, and Third Party Data, and Manager Research teams to collect, process, and deliver high-quality investment data at scale-supporting over 770,000 investments across thousands of global processes. We design and maintain the internal tools and systems that support data collection for operational, performance, portfolio, and document data; enable automation and AI-assisted workflows; improve analyst productivity and experience; and ensure data quality, scalability, and system stability. Our platforms are used by hundreds of analysts across the globe to process billions of data points every month. Location: Toronto, ON (4 days onsite)
Responsibilities
- Design, build, and improve CI/CD pipelines to accelerate software delivery while maintaining stability and security across our platform.
- Provision, configure, and maintain cloud infrastructure on AWS using Infrastructure as Code tools such as Terraform, CDK, or CloudFormation.
- Provide on-call technical triage and troubleshooting, driving incidents to resolution and conducting thorough post-incident reviews.
- Lead cross-team reliability initiatives, including disaster recovery planning, security compliance, and AWS resource optimization.
- Deploy and manage containerized applications using Docker and AWS ECS/EKS, optimizing resource utilization and deployment strategies.
- Drive automation and innovation for proactive monitoring, alerting, and continuous operational improvement using tools such as Splunk, CloudWatch, New Relic, and Harness.
- Collaborate with software engineers and data engineers to embed SRE best practices into the development lifecycle, including SLOs, error budgets, and capacity planning.
- Write scripts and tooling in Python, Bash, or other scripting languages to automate routine operational tasks and streamline deployments.
- Document infrastructure architecture, deployment processes, and operational runbooks to enable transparency, consistency, and long-term maintainability.
- Collaborate with globally distributed teams for projects, knowledge transfer, and on-call rotation coverage.
- Leverage AI-assisted development tools (e.g., GitHub Copilot, Claude Code) to accelerate engineering workflows and improve productivity.
Requirements
- 5+ years of experience in Site Reliability Engineering, DevOps, or cloud infrastructure roles supporting production systems.
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- Strong hands-on experience with AWS cloud services (EC2, S3, ECS/EKS, Lambda, RDS, VPC, IAM, Route 53, CloudWatch).
- Proficiency with Infrastructure as Code tools such as Terraform, CDK, or CloudFormation.
- Experience building and maintaining CI/CD pipelines using tools such as Jenkins, Harness, GitHub Actions, or similar platforms.
- Strong working knowledge of Docker containers and container orchestration platforms.
- Proficiency in scripting languages such as Python or Bash for automation and operational tooling.
- Solid understanding of Linux/Unix system administration and networking fundamentals.
- Experience with monitoring, logging, and alerting tools such as Splunk, New Relic, CloudWatch, or Datadog.
- Knowledge of SRE principles, including SLIs/SLOs, error budgets, incident management, and post-incident review processes.
- Experience using AI-assisted development tools (e.g., GitHub Copilot, Claude Code).
- Excellent communication and collaboration skills, with the ability to work effectively across distributed teams and explain infrastructure decisions clearly
- AWS certifications (e.g., Solutions Architect, DevOps Engineer, SysOps Administrator).
- Experience designing or supporting disaster recovery and business continuity strategies.
- Familiarity with security compliance frameworks and implementing security best practices in cloud environments.
- Experience with serverless architectures using AWS Lambda, SAM, or the Serverless Framework.
- Experience supporting distributed engineering teams across multiple time zones.
- Exposure to data pipeline infrastructure or platforms used for large-scale data processing.
- FinOps certification or experience with cloud financial management and cost optimization practices.
- Base Salary Compensation Range
- $90,489.00-132,711.00 Incentive Target Percentage
- 12.5% Annual
Benefits
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at morningstar? Share your experience