Paris's public heritage institutions are sitting on a quietly expanding problem. Across several major digitisation programmes, duplicate and misattributed images have been accumulating in shared databases for years — and the reckoning is arriving now, as the city prepares to open new archive portals tied to the post-Olympics urban legacy push along the Seine.
The issue surfaced most visibly this spring, when the Bibliothèque nationale de France flagged internal inconsistencies in its Gallica platform, which hosts more than nine million digitised documents. Technical staff identified significant clusters of duplicate image records — in some cases the same photograph catalogued under three or four separate identifiers — that were distorting search results and inflating collection counts. The BnF has not published a formal error report, but the problem is an open subject at heritage sector conferences in the 11th and 13th arrondissements, where archivists and digital humanities scholars have spent recent weeks debating both causes and fixes.
Why the problem is worse than it looks
Duplicate images are not just a tidiness issue. When public institutions share datasets — as Paris's mairies, the Musée Carnavalet on the Rue des Francs-Bourgeois, and the Direction régionale des affaires culturelles increasingly do under Grand Paris interoperability frameworks — a single duplicated record can propagate across multiple systems simultaneously. A photograph of a Haussmann-era façade in the 9th arrondissement, entered twice under slightly different metadata, effectively becomes two competing authoritative records. Researchers, journalists and urban planners pulling from those databases downstream have no way to know which version carries the correct date or attribution.
Specialists in the field point to a structural cause: the acceleration of digitisation during and after the Paris 2024 Olympic period, when several arrondissement-level cultural institutions received emergency funding to get collections online quickly. Speed and quality control made poor partners. One widely cited estimate within the sector, drawn from a 2025 audit commissioned by the Île-de-France region, suggested that between 12 and 18 percent of images ingested during accelerated digitisation rounds contained some form of duplicate or metadata error — though that figure has not been independently verified by The Daily Paris.
The Musée Carnavalet, which relaunched its online collections portal in 2023 after a major renovation, is considered something of a local benchmark for getting the process right. Its curatorial team built deduplication checks directly into the ingest workflow. But smaller institutions — a mairie annexe in the 19th, a local history society in Vincennes — lack the staffing to replicate that approach.
What needs to happen, and who says so
The conversation has shifted from diagnosis to prescription. The Agence nationale de la cohésion des territoires, which oversees digital infrastructure grants for local authorities, is now understood to be reviewing whether future funding rounds for heritage digitisation should require applicants to submit a deduplication protocol before money is released. No formal policy change has been announced as of July 4, 2026, but the direction is being discussed openly at the sector level.
Separately, Sciences Po's médialab in the 7th arrondissement has been developing open-source tooling designed specifically to identify near-duplicate images in archival collections — software that can flag visually similar images even when their metadata differs. Researchers there have been in contact with the BnF and with Paris Musées, the umbrella body that coordinates the collections of 14 city-owned museums. Whether those conversations produce a formal partnership is still unclear.
For institutions managing their own collections today, practitioners advise three immediate steps: audit ingestion logs going back to 2020, apply perceptual hashing tools to existing image libraries to surface visual duplicates, and freeze any new cross-institutional data-sharing until a single metadata standard is agreed. The cost of doing nothing compounds. Every new portal built on a dirty dataset exports the problem further down the chain — and in a city that has staked significant cultural and economic capital on making its archives publicly accessible, that is a risk no one in the sector wants to be responsible for.