Senior Site Reliability Engineer

External

Assetmark · Charlotte, NC

Full-timeHybrid1w ago

AWSAzureCapacity PlanningCI/CDComplianceConfluence

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We are seeking a Site Reliability Engineer (SRE) to join our Charlotte-based engineering team. This role sits at the center of platform resilience - ensuring high availability, performance, recoverability, and operational maturity across AssetMark's production systems. This is not a traditional operations role. Our SREs are engineers first: designing automation, building observability frameworks, improving deployment safety, defining reliability standards, and reducing operational toil through code. You will influence architectural decisions, strengthen incident management practices, and raise the reliability bar across both legacy and cloud-native systems. You will work on systems that operate 24/7, support financial transactions and advisor workflows, and must meet strict regulatory and security requirements. The right candidate is energized by complex distributed systems, high-stakes production environments, and the responsibility of building durable, scalable financial infrastructure. At AssetMark, reliability is a first-order expression of client obsession. Our SRE team plays a critical role in delivering the consistent, trusted technology experience that advisors depend on to run their businesses. We can only consider candidates for this position who are able to accommodate a hybrid work schedule and are close to our Charlotte, NC office.

Responsibilities

Reliability Engineering & Operations
Design, implement, and continuously improve the reliability, availability, and performance of critical AssetMark systems (batch, APIs, integrations, and customer-facing platforms)
Define and operationalize SLIs, SLOs, and error budgets for critical services in partnership with engineering and product teams
Participate in on-call rotations, incident response, and major incident management
Lead and contribute to blameless post-incident reviews, driving root cause analysis and measurable reliability improvements
Proactively identify reliability risks and lead remediation efforts before they impact clients
Observability & Monitoring
Build and maintain end-to-end observability across applications, infrastructure, and integrations (metrics, logs, traces, alerts)
Implement actionable monitoring and alerting to reduce noise and improve signal quality
Partner with application teams to instrument services using best-in-class observability practices
Ensure visibility into system health, capacity, performance, and failure modes across environments
Automation & Toil Reduction
Identify repetitive operational tasks and automate them through code
Improve deployment reliability through automation, self-service tooling, and safe rollout patterns
Reduce manual intervention in batch processing, integrations, and operational workflows
Apply Infrastructure-as-Code and configuration automation to improve consistency and repeatability
Cloud, Platform & Infrastructure Reliability
Support reliability of Azure-based infrastructure, containerized workloads, and hybrid environments
Partner with platform, DevOps, and infrastructure teams to improve resilience, scalability, and recovery
Contribute to capacity planning, performance tuning, and cost-aware reliability decisions
Ensure systems meet RTO/RPO, backup, and disaster recovery expectations
Secure & Compliant Operations
Embed security, compliance, and risk controls into operational practices
Work closely with Security and Compliance teams to meet financial services regulatory requirements
Ensure production systems follow least privilege, secure configuration, and auditability standards
Support vulnerability remediation and secure operational processes
Collaboration & Enablement
Partner with application engineering teams to improve production readiness and operational maturity
Influence system design by advocating for reliability-first architectural decisions
Provide guidance on operational best practices, deployment safety, and observability standards
Document operational patterns, runbooks, and reliability guidelines in Confluence
Act as a reliability advocate across AssetMark engineering teams
Knowledge, Skills, Abilities
Strong software engineering skills in .NET / C# (or Python, Java, or similar)
Experience operating distributed systems in production
Deep understanding of SRE principles: SLIs/SLOs, error budgets, toil reduction, incident management
Experience with Azure (or AWS/GCP), including compute, networking, and managed services
Knowledge of containerization and orchestration (Docker, Kubernetes preferred)
Experience with monitoring, logging, tracing, and alerting tools
Familiarity with CI/CD pipelines, automation, and Infrastructu

Benefits

Health insurance

Additional Information

Job Description: AssetMark is a leading strategic provider of innovative investment and consulting solutions serving independent financial advisors. We provide investment, relationship, and practice management solutions that advisors use in helping clients achieve wealth, independence, and purpose.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at assetmark? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect