Three of Paris's largest publicly funded cultural repositories announced this week they are accelerating a joint technical programme to identify and remove duplicate images from their online collections, a housekeeping effort that has taken on new urgency as the Grand Paris Express construction project generates thousands of new archival photographs each month. The initiative, coordinated under the umbrella of the Agence pour le Patrimoine Numérique de Paris, targets redundant image records that have accumulated across siloed databases over more than two decades of digitisation work.
The timing is not accidental. Since the Paris 2024 Olympics closed the door on an intensive period of city infrastructure documentation, institutional archives have been left holding overlapping sets of images — many capturing the same Seine riverbank renovations, the same Stade de France forecourt, the same metro shaft openings from slightly different angles on the same afternoon. Storage and licensing costs have climbed as a result, and public-facing portals have grown cluttered enough that researchers and journalists frequently report pulling the same photograph twice without realising it.
What Happened This Week
On Tuesday, the Bibliothèque nationale de France, based on the Quai François-Mauriac in the 13th arrondissement, confirmed it had completed a first automated sweep of its Gallica platform, flagging an estimated 40,000 image pairs as probable duplicates. Separately, the Musée Carnavalet — the city's dedicated museum of Parisian history, reopened after renovation in 2021 on the Rue de Sévigné in the Marais — said its digital team had begun cross-referencing its own holdings against the BnF database using perceptual hashing software, a technique that compares visual fingerprints rather than file names or metadata tags. A third partner, the Établissement Public Paris La Défense, which manages imagery related to the western business district, joined the consortium framework on Wednesday.
The practical trigger for this week's announcements was a deadline set under a 2024 European directive on public-sector data reuse, which requires member-state institutions to certify the integrity of publicly accessible digital collections by the end of the third quarter of 2026. France's Culture Ministry has linked compliance to the renewal of digitisation subsidies that several institutions depend on — in the BnF's case, a rolling grant programme worth roughly €8 million annually.
Why It Matters Beyond the Hard Drives
Duplicate images are more than a storage inconvenience. Under French intellectual property law, each separately catalogued image in a public collection can attract its own licensing record, meaning a single photograph appearing under thirty different identifiers may have generated thirty separate clearance requests from researchers, publishers and schools over the years. The Agence pour le Patrimoine Numérique estimates that cleaning up duplicates across the three institutions could simplify licensing workflows for up to 15 percent of their combined catalogue — a figure that, if confirmed, would represent a meaningful administrative saving across the sector.
For ordinary Parisians and the researchers who study the city, the more immediate benefit is usability. The Carnavalet's online portal, which draws on holdings spanning Roman-era artefacts to late-twentieth-century street photography of Belleville and Ménilmontant, has long frustrated users who find search results returning near-identical results. Librarians at the Médiathèque Françoise-Sagan on the Rue Léon Schwartzenberg in the 10th arrondissement — one of the city's busiest public research libraries — have flagged the duplicate problem to the consortium in writing, pointing to specific user complaints logged since January 2025.
The consortium expects to publish a joint technical report by September 15, ahead of the EU compliance deadline. Institutions that fail to meet the certification threshold risk losing access to the subsidised digitisation pipeline entirely, which would effectively freeze new acquisitions going into 2027. For anyone who uses these archives — for urban planning research, journalism, academic work, or neighbourhood history projects — the advice from the Agence is straightforward: if you have downloaded images from Gallica or the Carnavalet portal in the past eighteen months and catalogued them independently, check your records against the updated metadata files the BnF will begin publishing from July 14 onward. Duplicates in private research databases are a problem the institutions cannot fix for you.