Skip to main content
Back to jobs

Principal Engineer - DBRE

External
arcesiumllc logoArcesiumllc · Hyderabad, India
Full-timeOn-site6d ago
AWSCI/CDCloudFormationDatadogGitLabGitLab CI
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Drive architectural direction for the database platform across SQL Server, Aurora PostgreSQL, and Snowflake - covering high availability, disaster recovery, replication, backup and recovery, capacity, performance, and security.
  • Own complex, cross-cutting initiatives such as cross-region disaster recovery, platform refresh orchestration, alerting redesign, and cost optimization, taking each from problem statement through to a deployed, owned solution.
  • Lead by example with exemplary code, design documents, RFCs, and runbooks, setting the standard for technical writing, code quality, and operational rigor across the DBRE team.
  • Reduce operational toil by engineering automation across provisioning, refresh, patching, scaling, failover, and decommissioning - treating manual operations as bugs to be eliminated.
  • Lead alert engineering to drive sustainable reductions in alert volume while improving signal quality, partnering with application teams on alert ownership, attribution, and SLA design.
  • Drive incident response and root-cause analysis for the most complex production incidents, and convert RCAs into platform-level improvements that prevent recurrence.
  • Define reliability KPIs (availability, MTTR, alert sustainability, SLA adherence) and build the dashboards and reporting cadence to track them.
  • Partner with application engineering, infrastructure, and SRE teams on schema design, query performance, data lifecycle, and shared reliability patterns, and engage senior leadership on strategy, multi-quarter roadmaps, and budget trade-offs.

Requirements

  • A bachelor's or master's degree in computer science, Engineering, or a related field with 9+ years of professional engineering experience, including significant time in a principal-level or equivalent individual contributor role.
  • Deep, hands-on expertise in at least one major relational database platform (SQL Server or PostgreSQL) including replication, HA/DR architectures, performance tuning, query optimization, and internals.
  • Strong working knowledge of cloud infrastructure (AWS preferred): VPC networking, EC2, EBS, FSx, IAM, RDS/Aurora, and cross-region replication.
  • Strong programming skills in at least one of Python, PowerShell, Go, or T-SQL - capable of writing production-quality automation, not just scripts.
  • A proven track record designing and delivering large-scale reliability initiatives (HA/DR, observability, automation platforms) with measurable outcomes.
  • Experience leading complex incident response, root-cause analysis, and post-incident improvement programs in 24x7 environments.
  • Experience with observability platforms (Datadog, Prometheus, Grafana), modern alerting design, infrastructure-as-code (Terraform, CloudFormation), and CI/CD pipelines (GitLab CI, Jenkins).
  • Exceptional verbal and written communication skills, with the ability to produce clear design documents and executive-level summaries and to influence stakeholders across engineering, infrastructure, and business teams.
  • Experience across multiple database platforms (SQL Server / PostgreSQL / Snowflake / Aurora) and familiarity with financial-services data domains is a bonus.
  • Arcesium's Personal Data Privacy Notice for Candidates is linked here .
  • Recruiting Security
  • Emails from genuine Arcesium recruiters who are employees of the company will always come from the @arcesium.com domain. In some cases, you may also be contacted by independent search firms engaged to recruit on our behalf; emails from their

Benefits

Vision insurancePerformance bonus

Additional Information

Company Overview Arcesium is a global financial technology firm that solves complex data-driven challenges faced by some of the world's most sophisticated financial institutions. We constantly innovate our platform and capabilities to meet tomorrow's challenges, anticipate the risks our clients encounter, and design advanced solutions to help our clients achieve transformational business outcomes. Financial technology is a high-growth industry as change and innovation continue to disrupt the status-quo and prompt major transformation. Arcesium is at a particularly interesting time in our own growth as we look to leverage our successfully established market position and expand operations in pursuit of strategic new business opportunities. We value intellectual curiosity, proactive ownership, and collaboration with colleagues, and we empower you to meaningfully contribute from day one and accelerate your professional development. We are looking for an exceptional engineer to provide expert-level technical leadership for our Database Reliability Engineering (DBRE) platform. This is a hands-on individual contributor role that owns the architectural direction for our most complex database reliability challenges - high availability, disaster recovery, observability, and platform automation - across thousands of SQL Server, Aurora PostgreSQL, and Snowflake environments running mission-critical workloads for the world's most sophisticated financial institutions.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at arcesiumllc? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect