Senior Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Benefits
Additional Information
Job Description Building trusted markets - powered by our people At Cboe Global Markets, we inspire our people to solve complex challenges together because what we do matters. We provide the financial infrastructure that powers the global economy. As a leading provider of market infrastructure and tradable products, Cboe delivers cutting-edge trading, clearing and investment solutions to market participants around the world. We're building meaningful ways to support professional and personal development while strengthening the trust we've earned as a global market leader. Our teams are empowered to share ideas, actively pursue them and bring on a challenge. As champions of internal mobility and access to opportunity, we encourage our people to "go for it" and equip our managers with the training to coach their teams to the next level. We strive to provide employees a safe space to network, share ideas and create opportunities. Sound like the place for you? Join us! Role Overview The Sr. Site Reliability Engineer (Singapore) is a role served by veteran technologists with a diverse set of skills ranging from software development to systems, network, application, and/or database management - with deep subject matter expertise in one or more of these disciplines. This position is specifically instrumental in Cboe's follow-the-sun support model for its US Global Trading Hours (GTH) markets, providing critical overnight and early-session coverage from Singapore that ensures continuous, high-availability operations across Cboe's real-time low latency trading platforms. The Singapore-based Sr. SRE operates as a highly skilled member of a globally coordinated team responsible for platform engineering, configuration management, implementation, capacity planning, performance tuning, analysis, troubleshooting, reporting, and process automation. This individual provides technical support to Cboe Trade Desk and Operations Support Center staff across time zones, and works closely with Software Engineering, Systems Engineering, and Network Engineering teams to troubleshoot complex issues and coordinate platform configuration updates. A Sr. Site Reliability Engineer must be able to work independently with little to no direct supervision in performing their duties. Based on follow the sun coverage needs, work schedules may need to adjust throughout the year during daylight savings time to provide for overlapping support schedules with other regions and shift starting times may be as early as 5-6AM SGT. Major Job Duties Platform Configuration Management: Provide configuration management of new and existing trading platforms and support implementation of new features and functionality based on new business requirements. Monitor development activities, change management tickets, and evaluate their impact on Cboe Operations. Execute daily change tickets assigned to Site Reliability Engineering in support of updates to production, disaster recovery, and certification systems. While the primary focus of this role involves support of bare-metal on-premises infrastructure, experience with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes) is desirable as Cboe continues to evolve its infrastructure strategy. Incident Response & Technical Troubleshooting: Serve as a primary technical responder for production incidents occurring during US GTH market hours covered from the Singapore time zone. Lead and coordinate incident triage, root cause analysis, and resolution across globally distributed engineering and operations teams. Provide timely, precise communication to stakeholders during active incidents and contribute to post-incident reviews and remediation tracking to drive long-term platform stability. Technical Leadership & System Availability: Provide technical leadership, support, and operational oversight to sustain resiliency and high availability of critical business operations. Monitor production, disaster recovery, and certification systems for issues. Analyze and optimize performance of real-time trading platforms. Operate and maintain low-latency bare-metal infrastructure, including hardware health, Linux OS tuning, and kernel-bypass networking stacks such as Solarflare/Onload. Investigate software defects. Assist the build team to resolve build/deployment issues. Capacity Planning: Drive capacity planning decisions for Cboe Exchanges and support capacity planning needs of various Cboe business units. Provide an active voice within Capacity Planning meetings with engineering and technical operations management staff. Automation & Process Improvement: Provide thought leadership to identify task automation opportunities including automation of system health monitors, alerts, and remediations. Support automation efforts through development, testing, and maintenance of Python tools. Leverage AI to maximize efficiency. Reporting & Data Analysis: Create and improve upo