Senior Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are seeking a Site Reliability Engineer (SRE) to join our Charlotte-based engineering team. This role sits at the center of platform resilience - ensuring high availability, performance, recoverability, and operational maturity across AssetMark's production systems. This is not a traditional operations role. Our SREs are engineers first: designing automation, building observability frameworks, improving deployment safety, defining reliability standards, and reducing operational toil through code. You will influence architectural decisions, strengthen incident management practices, and raise the reliability bar across both legacy and cloud-native systems. You will work on systems that operate 24/7, support financial transactions and advisor workflows, and must meet strict regulatory and security requirements. The right candidate is energized by complex distributed systems, high-stakes production environments, and the responsibility of building durable, scalable financial infrastructure. At AssetMark, reliability is a first-order expression of client obsession. Our SRE team plays a critical role in delivering the consistent, trusted technology experience that advisors depend on to run their businesses. We can only consider candidates for this position who are able to accommodate a hybrid work schedule and are close to our Charlotte, NC office.
Responsibilities
- Reliability Engineering & Operations
- Design, implement, and continuously improve the reliability, availability, and performance of critical AssetMark systems (batch, APIs, integrations, and customer-facing platforms)
- Define and operationalize SLIs, SLOs, and error budgets for critical services in partnership with engineering and product teams
- Participate in on-call rotations, incident response, and major incident management
- Lead and contribute to blameless post-incident reviews, driving root cause analysis and measurable reliability improvements
- Proactively identify reliability risks and lead remediation efforts before they impact clients
- Observability & Monitoring
- Build and maintain end-to-end observability across applications, infrastructure, and integrations (metrics, logs, traces, alerts)
- Implement actionable monitoring and alerting to reduce noise and improve signal quality
- Partner with application teams to instrument services using best-in-class observability practices
- Ensure visibility into system health, capacity, performance, and failure modes across environments
- Automation & Toil Reduction
- Identify repetitive operational tasks and automate them through code
- Improve deployment reliability through automation, self-service tooling, and safe rollout patterns
- Reduce manual intervention in batch processing, integrations, and operational workflows
- Apply Infrastructure-as-Code and configuration automation to improve consistency and repeatability
- Cloud, Platform & Infrastructure Reliability
- Support reliability of Azure-based infrastructure, containerized workloads, and hybrid environments
- Partner with platform, DevOps, and infrastructure teams to improve resilience, scalability, and recovery
- Contribute to capacity planning, performance tuning, and cost-aware reliability decisions
- Ensure systems meet RTO/RPO, backup, and disaster recovery expectations
- Secure & Compliant Operations
- Embed security, compliance, and risk controls into operational practices
- Work closely with Security and Compliance teams to meet financial services regulatory requirements
- Ensure production systems follow least privilege, secure configuration, and auditability standards
- Support vulnerability remediation and secure operational processes
- Collaboration & Enablement
- Partner with application engineering teams to improve production readiness and operational maturity
- Influence system design by advocating for reliability-first architectural decisions
- Provide guidance on operational best practices, deployment safety, and observability standards
- Document operational patterns, runbooks, and reliability guidelines in Confluence
- Act as a reliability advocate across AssetMark engineering teams
- Knowledge, Skills, Abilities
- Strong software engineering skills in .NET / C# (or Python, Java, or similar)
- Experience operating distributed systems in production
- Deep understanding of SRE principles: SLIs/SLOs, error budgets, toil reduction, incident management
- Experience with Azure (or AWS/GCP), including compute, networking, and managed services
- Knowledge of containerization and orchestration (Docker, Kubernetes preferred)
- Experience with monitoring, logging, tracing, and alerting tools
- Familiarity with CI/CD pipelines, automation, and Infrastructu
Benefits
Additional Information
Job Description: AssetMark is a leading strategic provider of innovative investment and consulting solutions serving independent financial advisors. We provide investment, relationship, and practice management solutions that advisors use in helping clients achieve wealth, independence, and purpose.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at assetmark? Share your experience