Production Operations / Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Live Monitoring & Telemetry
- Monitor telemetry, dashboards, logs, alerts, and metrics to assess the health of services, applications, and prototype devices.
- Identify anomalies, failures, and performance degradation across software and device environments.
- Analyze real-time and historical data to diagnose issues and surface reliability risks.
- Triage operational issues and communicate findings clearly to engineering and product teams.
- Recommend improvements to monitoring coverage, alert quality, and operational visibility.
- Release & Operations Support
- Support software releases by validating deployments and monitoring post-release system stability.
- Track service and device health during rollouts, updates, and release validation periods.
- Investigate and assist in resolving live issues impacting internal users or device readiness.
- Partner with engineering teams on mitigations, fixes, rollbacks, and follow-up validation.
- Document release observations, risks, and stability assessments.
- Incident Response & Reliability
- Support incident response by gathering logs, diagnostics, and impact data.
- Summarize incidents, suspected root causes, and mitigation progress.
- Participate in post-incident reviews and document lessons learned.
- Maintain records of incidents, recurring issues, and known reliability risks.
- Identify opportunities to reduce operational toil through documentation or process improvements.
- On-Site Hardware & Environment Support
- Perform in-person troubleshooting for prototype devices and self-hosted systems when needed.
- Assist with device configuration, deployment, validation, and health checks.
- Run smoke tests and readiness checks to confirm system and device stability.
- Document hardware configurations, operational procedures, and environment setup.
- Collaboration & Communication
- Work cross-functionally with software engineering, QA, infrastructure, and product teams.
- Communicate system health, risks, and technical findings clearly and concisely.
- Provide regular status updates, health summaries, and operational reports.
- Operate independently while escalating issues appropriately when deeper engineering support is required.
Requirements
- Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
- 5-7+ years of experience in software engineering, DevOps, SRE, production engineering, service operations, or infrastructure roles.
- Experience monitoring live systems using telemetry, logs, metrics, dashboards, and alerting tools.
- Strong troubleshooting skills across software, services, and de
Benefits
Additional Information
Who is Blueprint? We are a technology solutions firm headquartered in Bellevue, Washington, with a strong presence across the United States. Unified by a shared passion for solving complicated problems, our people are our greatest asset. We use technology as a tool to bridge the gap between strategy and execution, powered by the knowledge, skills, and the expertise of our teams, who all have unique perspectives and years of experience across multiple industries. We're bold, smart, agile, and fun. What does Blueprint do? Blueprint helps organizations unlock value from existing assets by leveraging cutting-edge technology to create additional revenue streams and new lines of business. We connect strategy, business solutions, products, and services to transform and grow companies. Why Blueprint? At Blueprint, we believe in the power of possibility and are passionate about bringing it to life. Whether you join our bustling product division, our multifaceted services team or you want to grow your career in human resources, your ability to make an impact is amplified when you join one of our teams. You'll focus on solving unique business problems while gaining hands-on experience with the world's best technology. We believe in unique perspectives and build teams of people with diverse skillsets and backgrounds. At Blueprint, you'll have the opportunity to work with multiple clients and teams, such as data science and product development, all while learning, growing, and developing new solutions. We guarantee you won't find a better place to work and thrive than at Blueprint. In This Role In this role, you will support the reliability, stability, and live operations of a new device and software platform during internal testing and self-host programs. You will focus on monitoring system health through telemetry, investigating live issues, supporting software releases, and validating prototype devices in production-like environments. This is a hands-on, engineering-oriented operations role where you will work closely with software engineers, QA, infrastructure, and product partners to ensure operational readiness and service stability. You will independently manage day-to-day monitoring, triage incidents, support release validation, and provide clear, actionable insights to improve system reliability and product readiness.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at notexternal? Share your experience