Senior AIOps Engineer I

External

Porch · Remote

Full-timeRemote2w ago

BigQueryCI/CDDatadogDockerGCPKubernetes

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Benefits

Health insuranceVision insuranceRemote work options

Additional Information

Porch Group is a leading vertical software and insurance platform and is positioned to be the best partner to help homebuyers move, maintain, and fully protect their homes. We offer differentiated products and services, with homeowners insurance at the center of this relationship. We differentiate and look to win in the massive and growing homeowners insurance opportunity by 1) providing the best services for homebuyers 2) led by advantaged underwriting in insurance 3) to protect the whole home As a leader in the home services software-as-a-service ("SaaS") space, we've built deep relationships with approximately 30 thousand companies that are key to the home-buying transaction, such as home inspectors, mortgage companies, and title companies. In 2020, Porch Group rang the Nasdaq bell and began trading under the ticker symbol PRCH. We are looking to build a truly great company and are JUST GETTING STARTED. Job Title: Senior AIOps Engineer I Location: India Workplace Type: Remote Job Summary The future is bright for the Porch Group, and we'd love for you to be a part of it as our Senior AIOps Engineer I We are looking for a Senior AIOps Engineer I who will partner with product managers, platform engineers, data scientists, and machine learning engineers to ensure our AI and ML-powered systems are reliable, observable, secure, and cost-efficient in production. You will focus on how AI systems run in real-world environments: monitoring model performance and drift, ensuring robust deployment pipelines, managing incidents, standing up new AI infrastructure, and improving the stability and scalability of our AI platform. You'll help evolve our AI & ML Ops stack and operational processes so teams can ship AI features quickly and safely. Our AI/ML stack is based on Python and runs on Kubernetes (GKE) and Google Cloud Platform. We use tools such as Union Cloud (Flyte) for ML workflow orchestration, BentoML for model serving, Feast for feature stores, Label Studio for data annotation, BigQuery as our central data warehouse, and Dataflow for streaming/batch data pipelines. On the GenAI side, we operate a centralized LLM routing/gateway service across providers, batch prediction services for large-scale LLM inference, and are building out RAG infrastructure. You will maintain and harden this ecosystem - and stand up new infrastructure components as we expand our AI platform capabilities What You Will Do As A Senior AIOps Engineer I Own production reliability for AI/ML services Monitor and improve the reliability, availability, and performance of AI/ML-powered services running in production. Define and maintain SLOs/SLIs for critical AI systems (e.g., latency, error rates, model performance), tying them to user experience and business impact where possible. Own recurring model refresh cycles - coordinate retraining, validation, and redeployment of production models to prevent staleness and drift. Build and improve AI observability Design and implement monitoring, logging, and alerting for models and data pipelines in partnership with AI Engineers and Data Scientists. Integrate model and system metrics with existing observability stacks (Datadog, Opik, etc.) and dashboards used by engineering and operations teams. Build and maintain monitoring workflows for pipeline health. Support scalable, safe deployment of models Collaborate with data scientists and ML engineers to streamline deployment workflows for models and related services (blue/green, canary, A/B, shadow deployments). Support the productionization of image-based ML models, including batch prediction workflows, model performance monitoring, and data pipeline integration. Improve CI/CD pipelines and release processes for AI services to reduce risk and increase deployment frequency. Stand up and operate AI infrastructure Provision, deploy, configure, and maintain new AI infrastructure components as they are adopted across the organization - including AI gateways, RAG platforms, LLM observability tools, agentic workflows, and no-code agent builders. Utilize and improve existing frameworks and tools (e.g., Union Cloud, BentoML, Feast, Kubernetes, Terraform, and GCP services) to support robust and maintainable AI infrastructure. Build automation and tooling to reduce manual operational work, especially around model promotion, configuration, environment management, and Docker image maintenance. Support multi-BU infrastructure provisioning - create and manage separate environments, projects, roles, and CI/CD integrations for different business units Operate and maintain Label Studio Own the operational health of Label Studio - our production data annotation platform used for ground truth collection, model evaluation, and ML training dataset creation. Maintain the supporting infrastructure around Label Studio, including GCS storage buckets and BigQuery data pipelines that feed annotation projects; coordinate with Platform/IT partners for database and SSO de

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at porch? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect