Lead Critical Incident Resolution: Personally drive the resolution of high-severity, multi-system failures. Manage the end-to-end incident lifecycle, from initial triage and communication to post-incident review.
Deep-Dive Analysis: Conduct expert-level Root Cause Analysis (RCA) on persistent order failures, leveraging deep knowledge of logs, databases, APIs, and microservices to identify systemic architectural or integration flaws.
Strategic Problem Management: Champion problem management by translating technical findings into actionable recommendations for Engineering and Product teams.
Proactive Operational Improvement & Innovation
Lead Continuous Improvement: Proactively identify operational bottlenecks and lead initiatives to optimize processes, reduce manual toil, and increase system reliability.
Architectural Influence: Provide strategic recommendations to Product and Engineering to influence future design, architecture, and feature enhancements.
Operational Excellence: Lead the design and implementation of scalable monitoring solutions (APM, synthetic transactions) to shift from reactive troubleshooting to proactive detection.
Automation: Drive the development and implementation of automation tools for data repair, diagnostics, and routine operational tasks.
Leadership & Mentorship
Technical Mentorship: Provide expert guidance to L1/L2 support engineers, fostering a culture of technical rigor and career growth.
Team Management: Oversee incident backlogs, define key operational KPIs, conduct performance reviews, and ensure high standards in knowledge management (runbooks, documentation).
Cross-Functional Liaison: Act as the primary technical SME, representing the support organization in architectural reviews and collaborating with Product, IT, and Network teams.
Mandatory Qualifications:
Bachelor's degree in CS/IT or related field.
More than 5 years of hands-on Java development (microservices/distributed systems).
More than 5 years in Order Management/Billing platforms with solution design experience.
Proficiency in Couchbase (N1QL, data modeling) and SQL.
Job Description:
Business Overview
The Technology Platforms Division (TPD) drives the growth of the Rakuten Ecosystem by delivering innovative, high-quality technology platforms characterized by integrated control and strategic partnerships.
Within TPD, the Telecom Business Application Supervisory Department (TBASD) develops and maintains a unified, high-quality Business Support System (BSS) for Rakuten Mobile. We deliver agile, scalable solutions across the customer lifecycle and continuously enhance system performance through close collaboration with stakeholders.
Department Overview
The BSS Ops Department (BSOPD) is responsible for operating a high-quality Business Support System (BSS) that integrates with the broader Rakuten Ecosystem, directly contributing to maximize Rakuten Mobile's business. These BSS platforms are critical for managing telecommunication business operations. Additionally, we provide excellent customer support and facilitate all BSS integrations.
Position:
Why We Hire
We are seeking a highly technical, hands-on leader to own the stability and operational excellence of our O2A (Order-to-Activate) ecosystem. You will serve as the technical authority for high-impact production incidents while proactively driving systemic improvements. You will bridge the gap between reactive support and proactive engineering, leading initiatives that reduce technical debt, enhance observability, and optimize team performance.
Position Details