Senior Data AI Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Benefits
Additional Information
You have a clear vision of where your career can go. And we have the leadership to help you get there. At CNA, we strive to create a culture in which people know they matter and are part of something important, ensuring the abilities of all employees are used to their fullest potential. A senior individual contributor role responsible for designing, building, and operationalizing end-to-end AI and machine learning solutions that accelerate CNA's migration to a modern cloud data lakehouse. The engineer works across structured and unstructured data domains - including documents, images, audio, and transactional records - to unlock analytical value through scalable pipelines, RAG architectures, vector databases, and knowledge graphs. This role may also provide guidance to others to support the building of complex technical capabilities. JOB DESCRIPTION: Essential Duties & Responsibilities Performs a combination of duties in accordance with departmental guidelines: Design and build AI solutions that accelerate data migration from legacy systems to the cloud, ensuring scalability, reliability, and governance compliance. Design and implement scalable ingestion and transformation pipelines across structured (SQL, relational) and unstructured (documents, images, audio, email, call transcripts) data sources, applying OCR, NLP preprocessing, and document chunking strategies optimized for LLM consumption. Implement modern lakehouse patterns on Google Cloud Platform (GCP) - including data governance, cataloging, and lineage tracking - to ensure data is reliably discoverable, auditable, and fit for AI/ML workloads at scale. Design and implement vector databases, embedding pipelines, and knowledge graph structures that serve as the foundational retrieval layer for RAG and other AI applications. Productionize and operationalize AI solutions and advanced analytics in a DevOps/ MLOps environment, including automated testing, monitoring, and rollback capabilities. Cultivate innovation by proactively proposing new ideas and identifying the right combination of tools and frameworks to turn business problems into analytics solutions. Researches , identifies and implements process improvements that address complex technology gaps. Builds strong knowledge of technology enablers. May perform additional duties as assigned. Reporting Relationship Typically Director or above Skills, Knowledge & Abilities Deep expertise building scalable ingestion and transformation pipelines across structured and unstructured data sources; strong background migrating workloads from legacy systems to modern cloud platforms. Skilled in parsing and normalizing diverse content types - PDFs, emails, images, and call transcripts - using OCR, NLP preprocessing (tokenization, entity extraction, summarization), and document chunking strategies optimized for LLM consumption. Proven experience designing and implementing vector databases (e.g., Vertex AI Vector Search, Pinecone, pgvector), embedding pipelines, and knowledge graph structures that underpin RAG and semantic search applications Strong SQL and data analytical skills; experience building data marts and feature datasets for data science and ML applications. Strong coding fluency in Python; hands-on experience with BigQuery, Claude Code, RAG architectures, LLMs, ADK, and prompt engineering techniques Expertise in building ML platforms and data pipelines at scale; familiarity with major ML algorithms, deep learning, NLP, information retrieval, and data mining techniques Experience with GCP services (Vertex AI, Dataflow, BigQuery, Cloud Run, Pub/Sub); comfort with distributed computing frameworks (Apache Spark, Dataproc) for large-scale data processing. Solid experience managing diverse data sources including preprocessing, cleansing, and verifying data integrity to meet data science and ML requirements Demonstrated experience with machine learning, deep learning, information retrieval, NLP, or data mining - particularly applied to unstructured or semi-structured data Hands-on experience with vector databases, embedding models (e.g., text-embedding-gecko, OpenAI Ada, Cohere), and end-to-end RAG pipeline design Experience using Agile methods preferred. Strong communication and interpersonal skills and the ability to work effectively with peers and team members in a highly matrixed environment. Preferred experience with the insurance industry, its products and services. Experience in implementing big data processing technology. Apache Spark preferred. Education & Experience Bachelor's Degree in Computer Science , Engineering, Mathematics, Computational Statistics, Data Science, or a related technical field (or equivalent experience); Master's Degree preferred. Typically 7+ years of experience in data engineering, Artificial Intelligence or Machine Learning. 2+ years of coding proficiency in at least one programming language (Python, Java, SQL). Applicable certifications preferred