Manager, Site Reliability Engineering
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Reliability & Incident Management
- -Own end-to-end reliability of production systems, ensuring uptime within defined SLAs
- Lead and govern a 24x7x365 incident management team, ensuring quick response and resolution
- Act as escalation point during critical incidents and drive coordination across teams
- Ensure proper incident tracking, communication, and status page updates
- Incident Improvement & RCA
- -Drive a strong blameless RCA culture across the team
- -Ensure all customer-impacting incidents are analysed with clear root causes
- -Track and drive closure of RCA action items to prevent repeat issues
- -Identify recurring patterns and push for permanent fixes
- Observability & Monitoring
- -Own and improve observability using tools like Datadog, CloudWatch, ELK, Prometheus
- -Guide teams on effective logging, alerting, and monitoring practices
- -Reduce alert noise and improve signal-to-noise ratio
- -Drive proactive monitoring and early detection of issues
- Automation & Operational Efficiency
- -Drive automation to reduce manual effort and operational toil
- -Identify repetitive issues and build solutions to eliminate them
- -Ensure runbooks and playbooks are created and followed for recurring incidents
- Release Stability & Production Readiness
- -Work with Product, Engineering & Platform teams to improve release quality and stability
- -Ensure proper readiness checks before production deployments (monitoring, rollback, alerts)
- -Reduce production issues caused by releases
- AI Workload Reliability
- Support reliability and monitoring of AI/ML workloads in production and experimentation environments.
- Ensure visibility, stability, and cost awareness for AI-driven systems
- -Bring structure and best practices as AI adoption grows
- Team Leadership & Development
- -Lead and mentor a team of ~14 engineers across operations and SRE excellence
- -Build team maturity and reduce dependency on senior members
- -Develop strong ownership and accountability within the team
- Cross-team Collaboration
- -Work closely with Engineering, Product and Platform teams
- -Ensure smooth coordination during incidents and releases
- -Communicate effectively with stakeholders during high-severity situations
- -Collaborate with stakeholders to align reliability and platform str
Additional Information
Are you our " TYPE "? Monotype Global Named "One of the Most Innovative Companies in Design" by Fast Company, Monotype brings brands to life through type and technology that consumers engage with every day. The company's rich legacy includes a library that can be traced back hundreds of years, featuring famed typefaces like Helvetica, Futura, Times New Roman and more. Monotype also provides a first-of-its-kind service that makes fonts more accessible for creative professionals to discover, license, and use in our increasingly digital world. We work with the biggest global brands, and with individual creatives, offering a wide set of solutions that make it easier for them to do what they do best: design beautiful brand experiences. Monotype Solutions India Monotype Solutions India is a strategic center of excellence for Monotype and is a certified Great Place to Work® three years in a row. The focus of this fast-growing center spans Product Development, Product Management, Experience Design, User Research, Market Intelligence, Research in areas of Artificial Intelligence and Machine learning, Innovation, Customer Success, Enterprise Business Solutions, and Sales. Headquartered in the Boston area of the United States and with offices across 4 continents, Monotype is the world's leading company in fonts. It's a trusted partner to the world's top brands and was named "One of the Most Innovative Companies in Design" by Fast Company. Monotype brings brands to life through the type and technology that consumers engage with every day. The company's rich legacy includes a library that can be traced back hundreds of years, featuring famed typefaces like Helvetica, Futura, Times New Roman, and more. Monotype also provides a first-of-its-kind service that makes fonts more accessible for creative professionals to discover, license, and use in our increasingly digital world. We are looking for an experienced and hands-on Site Reliability Engineering (SRE) Manager to lead the reliability, stability, and operational excellence of our enterprise platforms. This role will own both 24x7 incident management operations and SRE engineering efforts, ensuring high system availability, fast incident response, and continuous improvement of platform reliability. You will lead a team responsible for maintaining uptime, reducing incidents, improving response times, and building a more proactive and self-sufficient SRE function. The role requires a balance of hands-on technical depth and people leadership, with a strong focus on automation, observability, release stability, and team maturity. As we expand into AI-driven workloads, you will also support reliability, monitoring, and scalability of these systems.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at monotype? Share your experience