Senior Observability / Monitoring Engineer - R01564150
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Senior Observability / Monitoring Engineer Role Overview We are looking for a Senior Observability / Monitoring Engineer to design, implement, and optimize observability solutions for large-scale enterprise platforms. This role will play a critical part in enabling proactive monitoring, faster incident detection, and improved system reliability across Salesforce and Microsoft Azure environments. The ideal candidate will have strong expertise in metrics, logs, traces, alerting strategies, and observability tooling, along with hands-on experience supporting production environments. Key Responsibilities - Observability Engineering: Design and implement end-to-end observability frameworks across Salesforce and Azure platforms Establish unified monitoring across logs, metrics, and distributed tracing Define and standardize observability best practices, dashboards, and alerting strategies Enable proactive detection of issues through intelligent alerting and anomaly detection Monitoring & Tooling Implement and manage tools such as Azure Monitor, Application Insights, Splunk, Datadog, Grafana, Prometheus, or similar Build actionable dashboards for operations, SRE, and business stakeholders Optimize alert noise reduction and improve signal-to-noise ratio Continuously enhance monitoring coverage across applications and infrastructure Incident Support & Reliability Support incident management by providing deep insights using observability data Perform root cause analysis (RCA) leveraging logs, traces, and metrics Collaborate with SRE and engineering teams to improve system reliability and performance Contribute to post-incident reviews and continuous improvement initiatives Automation & Integration Automate monitoring setup and configuration using Infrastructure as Code (IaC) Integrate observability tools with CI/CD pipelines and DevOps workflows Develop scripts or tools to enhance monitoring capabilities and data collection Platform & Integration Support Monitor and optimize Salesforce applications, including integrations and APIs Support Azure-based services, ensuring visibility across compute, storage, and networking layers Ensure end-to-end observability across integrated systems and middleware Governance & Compliance Ensure observability practices align with security and compliance requirements (e.g., SOX) Maintain documentation, runbooks, and monitoring standards Support audits and governance reviews as required Required Skills & Qualifications Technical Skills Strong experience in observability, monitoring, or SRE roles Hands-on experience with Azure (Azure Monitor, Application Insights) Experience with observability tools (Splunk, Datadog, Prometheus, Grafana, etc.) Strong understanding of logs, metrics, traces, and distributed systems Experience with APM tools and performance tuning Scripting skills (Python, PowerShell, Bash, or similar) Familiarity with CI/CD tools (Azure DevOps, Jenkins, GitHub Actions) Knowledge of Infrastructure as Code (Terraform, ARM, Bicep) Platform Knowledge Experience supporting Salesforce environments (monitoring integrations, APIs, performance) Understanding of cloud-native and microservices architectures Operational Excellence Experience in incident management and RCA Ability to analyze system performance and recommend improvements Strong troubleshooting and analytical skills Soft Skills Strong communication and collaboration skills Ability to work with cross-functional and global teams Proactive mindset with a focus on continuous improvement
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Brillio? Share your experience