Operate and support core Azure services (compute, storage, networking integrations, identity, logging/monitoring) to meet uptime and SLA expectations; perform root cause analysis and implement prevention/remediation.
Troubleshoot production cloud issues, restore service, and drive operational improvements; maintain clear runbooks and problem/resolution documentation.
Manage and improve cloud tagging, logging, alerting, and operational standards.
Administer and evolve Azure Kubernetes Service (AKS) as a core platform: cluster lifecycle management, upgrades, configuration standards, capacity/scaling, and operational readiness.
Partner with Application and Data Engineering teams to enable secure deployment patterns on AKS for both application and AI/agent workloads.
Build and maintain Infrastructure-as-Code using Terraform and Bicep where appropriate; contribute to reusable modules and standardized patterns.
Create script-level automation for cloud operations and deployments using PowerShell and/or Python.
Troubleshoot CI/CD pipeline failures and improve release workflow reliability across Azure DevOps and GitHub Actions.
Operate and enhance monitoring/alerting/logging standards and dashboards using Azure Monitor and Log Analytics; contribute to adoption of OpenTelemetry practices and integration with visualization tooling (e.g., Grafana).
Provide operational support for Microsoft Entra ID (Azure AD) and related access controls (MFA, conditional access as applicable), collaborating with Security/Network teams on patterns and guardrails.
Support compliance readiness (e.g., SOC-aligned controls) through standards, evidence-friendly configuration, and disciplined change practices.
Provide secondary operational support and troubleshooting for the data platform, including Snowflake, Microsoft SQL Server, and PostgreSQL, partnering with Data Engineering/DBA teams as needed.
Own cloud cost optimization activities: tagging standards, cost allocation visibility, budget alerts, cost anomaly investigation, and continuous cost/performance tuning.
Support Azure-hosted AI workloads and Microsoft Foundry services (including AKS-hosted bots/agents) with a focus on secure access controls, secrets management, observability, and guardrails.
Contribute to AI-assisted engineering workflows
Knowledge, Skills & Abilities:
Strong hands-on Azure experience across foundational services (compute, storage, networking constructs, App Service/managed services where applicable).
AKS / Kubernetes operational experience, including cluster administration concepts (upgrades, scaling, reliability).
Infrastructure as Code: Terraform experience (modules/patterns); familiarity with Bicep is a plus.
Scripting/automation with PowerShell and/or Python.
Monitoring and troubleshooting with Azure Monitor and Log Analytics; familiarity with OpenTelemetry concepts preferred.
Demonstrated ability to write clear documentation/runbooks and communicate effectively across teams.
Bachelor's degree in IT or a related technical field (or equivalent experience).
4+ years supporting cloud platforms and production operations (Azure-centric).
Experience supporting CI/CD systems (Azure DevOps; GitHub Actions familiarity welcome).
Experience partnering with Security/Network teams on secure patterns and operational guardrails.
Experience operating AI/agent services on Kubernetes and contributing to safe prompt/agent guardrails and observability approaches.
Experience with Grafana dashboards and alerting integration.
Familiarity with Snowflake, SQL Server, and/or PostgreSQL operational support is a plus.
Compensatio
Additional Information
Job Description:
AssetMark is a leading strategic provider of innovative investment and consulting solutions serving independent financial advisors. We provide investment, relationship, and practice management solutions that advisors use in helping clients achieve wealth, independence, and purpose.
AssetMark is seeking a knowledgeable, hands-on Cloud Engineer to operate and continuously improve our Azure cloud platform. This role blends day-to-day cloud operations with building and owning shared platform capabilities (guardrails, automation, observability, and AKS platform operations). You will co-design solutions and partner closely with dedicated Network and Security teams to deliver secure, reliable, compliant, and cost-effective cloud services. The primary responsibilities of this role will focus on cloud infrastructure areas covering Azure Platform Operations & Reliability, Azure Kubernetes Service, Infrastructure as Code & Automation, CI/CD Enablement, Observability, Identity, Security, and Compliance, Data Platform Support, FinOps and Cost Governance, and AI Enablement.
We can only consider candidates for this position who are able to accommodate a hybrid work schedule and are close to our Charlotte, NC office.