2026-0095 Big Data and AI Technology for Searchable Archives (NS) - WED 8 Jul
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
BIDDING INSTRUCTIONS The Bidder shall submit the Proposed Person Curriculum Vitae (CV). This CV shall have enough details and evidence of the individual's previous work to show suitability and compliance for the job based on the work description included in the Statement of Work. Deadline Date: Wednesday 08 July 2026 Requirement: Big Data and AI Technology for Raw Data to Searchable Archives - Data Processing Pipeline Location: On-site at NATO Communications and Information Agency, The Hague, The Netherlands Period of Performance: 2026-27: 12 August 2026 - 28 February 2027 Required Security Clearance: NATO SECRET 1. INTRODUCTION The NATO Communications and Information Agency (NCIA) located in The Hague, Netherlands, is currently involved in processing vast amounts of highly variant data coming from theatre for the purpose of efficient archiving. Within NCIA Chief Technology Office, the Exploiting Data Science and Artificial Intelligence (EDS&AI) team is tasked to apply Big Data and AI technology to prepare, run and adjust processing pipelines for processing various source data into archiving formats and metadata, and prepare for semantic search. NATO has an obligation to support national investigations into situations that occurred in theatre. In order to support the different teams involved most optimally, the EDS&AI team brings the expertise to extract and exploit the vast and varied data on the table, by using the Agency's high performance computing classified sandbox. The EDS&AI team provides the core data science skills and technology needed for big data analysis and AI, and applies innovative technology to data whenever it is not possible to extract value with conventional approaches. 2. OBJECTIVE This Statement of Work describes the work necessary to provide specific AI and Data Exploitation activities for processing raw data from theatre to searchable archives. The services will be provided to the NCIA CTO/EDS&AI team, as they deliver specialised Data Science and AI results to their stakeholders in NATO Headquarters and NATO Allied Command Operations. Overarching objectives: Make required documents from theatre accessible and searchable by archivists during execution Capture document contents into long term preservation formats Capture Functional Area System (FAS; back-up) contents into long term preservation formats Identify (and remove) duplicate documents, records of temporary value and non-records that are not required for archiving Provide interim and final data reports describing actions and results This task is structured as a deliverable-based engagement and not as level-of-effort support. 3. SCOPE OF WORK Under the direction of CTO-EDS&AI, the Contractor shall design, build, adapt, execute and maintain data processing pipelines within the NCIA classified sandbox environment. Setting up and improving pipelines to process all required documents that uniquely identify and trace decisions and processing steps. This is to be conducted on the provided classified sandbox environment, with provided performance hardware and toolsets. Implementing and improving pipeline steps for marking duplicate files, based on file attributes, path structure and content similarity, and rules for considering a file or structure a duplicate. Extracting document-format records from Functional Area Systems (FAS) databases and back-ups. Archiving SMEs and system SMEs are available for guidance on target formats and source system structure and data interpretation. Each FAS is processed separately. Processing and monitoring progress of various office, image and video file types to accepted archiving formats, including extraction of metadata and preparing semantic search indexes. Automating the registration of all processed documents with semantic indexes using the sandbox natural language search tool. Automating the final copy of all non-duplicate and extracted archive documents with content and metadata to the NATO archiving system. Reporting status, progress and statistics of the raw files being processed to archive formats, metadata and search indexes. Delivering full reporting of results, trace of pipeline steps taken and stakeholder-accepted failures. Quarterly updates. In general, most items will translate to a build (new pipeline or processing step), execute (reported progress on data batches), improve (optimized or corrected pipeline or processing step) or monitor (check on logs and progressing statistics) activity. Orchestrating pipelines are expected to utilize KNIME. Reporting efforts are expected to target Microsoft Power BI dashboards. GitLab is expected to be used for source code management and documentation. 4. DELIVERABLES AND PAYMENT MILESTONES Work shall be delivered through Processing Units (PU). A Processing Unit represents completion of a prioritized measurable work package. The following Processing Unit types and quantities apply: Delivery or enhancement of a pipeline component: 4 PUs Execution of p
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at EMW, Inc.? Share your experience