Cloud Solution Architecture - Infrastructure
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
The Infrastructure Cloud Solution Architect (CSA) serves as a trusted technical advisor for Microsoft's most strategic and mission-critical customers. This role helps customers improve the reliability, resilience, security, performance, and operational excellence of their Azure environments through proactive assessments, technical guidance, incident leadership, and cross-functional collaboration. Working within a global follow-the-sun operating model, the CSA collaborates closely with customers, Microsoft Engineering, Support, and Customer Success teams across multiple regions and time zones to drive rapid incident resolution, operational improvements, and long-term business outcomes. Success requires deep technical expertise, strong customer advocacy, and the ability to navigate complex operational challenges while influencing stakeholders across diverse organizations and cultures.
Responsibilities
- Trusted Advisor & Customer Advocacy
- Act as a trusted technical advisor, helping customers improve the reliability, resiliency, security, performance, and operational maturity of mission-critical workloads running on Azure.
- Advise customers and stakeholders on architecture, operations, and best practices aligned with the Azure Well-Architected Framework.
- Actively listen to and understand customer priorities, advocate on their behalf within Microsoft, and drive outcomes measured through customer satisfaction, operational excellence, and business impact.
- Build strong technical relationships with customers and Microsoft stakeholders, establishing credibility through deep technical expertise and trusted guidance.
- Communicate complex technical concepts and recommendations in clear, actionable terms to both technical and executive audiences.
- Incident Leadership & Operational Excellence
- Lead complex troubleshooting efforts across infrastructure, platform, and application layers, including critical and high-severity incidents.
- Operate effectively in high-stakes, customer-impacting incidents, combining platform expertise and customer business context to accelerate mitigation, recovery, and restoration of service.
- Facilitate Root Cause Analysis (RCA) activities for critical incidents, helping customers identify corrective and preventative actions that reduce future risk.
- Analyze support cases, operational telemetry, incident trends, and platform events to identify recurring risks and recommend proactive remediation measures.
- Drive reduction of reactive operational demand through reliability-focused recommendations, operational maturity improvements, resiliency best practices, and service optimization initiatives.
- Promote operational excellence across reliability, availability, security, performance, recoverability, and capacity management.
- Proactive Risk Management & Continuous Improvement
- Perform proactive health assessments, risk reviews, and operational analysis to identify opportunities for improvement and escalation prevention.
- Maintain a culture of curiosity by looking beyond immediate symptoms and root causes to understand systemic factors, historical decisions, and operational patterns that drive long-term improvements.
- Correlate customer requirements, operational events, and platform signals into actionable recommendations with clear accountability and ownership.
- Drive operational maturity through recommendations for observability, monitoring, automation, governance, reliability engineering practices, disaster recovery preparedness, and service management processes.
- Utilize telemetry, monitoring platforms, observability tools, and query languages to investigate issues, identify trends, and develop actionable insights.
- Customer Engagement & Service Delivery
- Develop and maintain deep technical understanding of assigned customer environments, architectures, dependencies, and mission-critical workloads.
- Create and maintain customer knowledge documentation, operational records (KnowMe), and workload profiles.
- Deliver onboarding assessments and help define service delivery and improvement plans aligned with customer objectives.
- Scope technical engagements, facilitate discussions on workstreams, prioritize recommendations, and align stakeholders on action plans and expected outcomes.
- Track remediation progress and drive alignment across customers and Microsoft stakeholders.
- Global Collaboration & Stakeholder Management
- Operate effectively within a global follow-the-sun support model, collaborating with teams across multiple regions and time zones to ensure continuity of service for mission-critical workloads.
- Maintain awareness of ongoing customer engagements, incidents, escalations, and engineering activities occurring outside local business hours, incorporating relevant developments into ongoing service delivery.
- Drive effective cross-time-zone coordination through structured handoffs, action tracking, stakeholder alignment, and knowledge sha
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at MICROSOFT OPERATIONS PTE LTD? Share your experience