Several of Paris's most prominent cultural and archival institutions have moved this week to address a systemic problem with duplicate and incorrectly labelled images embedded in their public-facing digital catalogues — a technical failure that has quietly undermined the reliability of online collections accessible to millions of researchers, tourists and students each year.
The issue, which archivists and digital preservation specialists have flagged internally for months, came to a sharper public focus this week after users of Gallica, the Bibliothèque nationale de France's online library platform, began reporting that search results for historical Paris street photographs were returning the same image multiple times under different catalogue numbers, dates and even different arrondissements. The BnF, headquartered on the Quai François Mauriac in the 13th arrondissement, confirmed it is conducting a technical audit of affected collections.
How the Problem Developed
The duplication issue traces back to a series of digitisation drives carried out between 2018 and 2023, when institutions were under pressure to expand online access. Batches of physical photographs, maps and illustrated documents were scanned by multiple contractors at different stages, sometimes without cross-referencing existing catalogue entries. The result: an unknown but significant number of records now carry conflicting metadata pointing to the same underlying image.
The Musée Carnavalet, the city history museum that reopened after a major renovation in April 2021 on the Rue de Sévigné in the Marais, is among the institutions whose digitised holdings intersect with the Gallica platform. Archivists there have been asked to cross-check roughly 4,000 image records flagged as potential duplicates, according to documentation circulating among heritage professionals this week. The Paris Mémoire Vive project, a municipal initiative set up in 2022 to coordinate digitisation of neighbourhood-level photographic archives held by the mairies of individual arrondissements, is also under review.
The practical consequences extend beyond inconvenience. Scholars relying on digitised Haussmann-era streetscape photographs to study urban change in neighbourhoods like Belleville or the Plaine Saint-Denis — areas undergoing intense transformation linked to Grand Paris Express construction and post-Olympics regeneration — risk building arguments on images that are misdated by a decade or more.
What Institutions Are Doing Now
The BnF has engaged a working group under its direction numérique et des technologies, which is expected to deliver an initial assessment by September 2026. The process involves automated perceptual hashing — a technique that compares images pixel-by-pixel to detect near-identical copies regardless of file name or metadata — alongside manual review for edge cases where images are cropped or colour-corrected versions of the same source photograph.
The cost of remediation is not trivial. Industry benchmarks for large-scale digital catalogue cleaning in comparable European institutions — the British Library undertook a similar exercise in 2023, the Rijksmuseum in Amsterdam completed one in 2022 — suggest per-record correction costs ranging from €4 to €12 depending on the degree of manual intervention required. Applied to even a conservative estimate of 50,000 affected records in Paris's combined public collections, the bill could reach €600,000 before staff time is counted.
Researchers working at the Centre Pompidou's public information library, the BPI on the Rue Beaubourg in the 4th arrondissement, have already been advised by staff to verify any digitised image citation against at least one physical or secondary source before submission. That guidance, informal for now, reflects a quiet acknowledgment that the problem has spread beyond any single institution.
For anyone using Paris's public digital archives in the coming weeks, the practical advice is straightforward: treat catalogue numbers as provisional rather than authoritative, download the highest-resolution version available to allow visual comparison, and flag suspected duplicates using the BnF's feedback mechanism on individual Gallica record pages. The audit is ongoing, corrections will be pushed to the live catalogue on a rolling basis, and institutions say they expect the most heavily searched collections — those covering the pre-war city, the Occupation years, and the postwar reconstruction of the banlieues — to be cleared and re-verified by the end of the third quarter of 2026.