Paris city hall notified cultural institutions across the capital this spring that its unified digital heritage archive, maintained under the Paris Musées open-access platform, had accumulated an estimated 340,000 duplicate image records — redundant scans, reprocessed photographs, and mirror uploads that were quietly inflating storage costs and degrading search results for the roughly 1.2 million users who access the collections annually.
The problem is not cosmetic. Duplicate imagery in public digital archives distorts indexing, skews algorithmic recommendations, and in some cases causes legally distinct works — two separately photographed versions of the same Haussmann-era façade on the Rue de Rivoli, for instance — to be collapsed into a single record, erasing provenance data that curators spent years assembling. For a city that has made the post-2024 Olympics digital legacy a centrepiece of its cultural policy, a messy backend archive is a visible embarrassment.
What Paris Is Actually Doing
The city's response has been routed through two specific programs. The first is a deduplication protocol launched in March 2026 by Paris Musées in coordination with the Bibliothèque historique de la Ville de Paris, based in the Marais on the Rue des Francs-Bourgeois. That protocol uses perceptual hashing — a technique that compares image fingerprints rather than file names — to flag near-identical records for human review. The second is a broader Seine-Saint-Denis digitisation audit tied to the Grand Paris Express communication infrastructure rollout, which has seen suburban cultural archives from Aubervilliers and Saint-Denis integrated into central city systems for the first time, dramatically increasing the volume of potentially duplicated material flowing into shared databases.
The Bibliothèque historique alone holds more than 800,000 digitised items. Merging its photographic collections with those of the Musée Carnavalet — which reopened after a major renovation in 2021 and has since added tens of thousands of newly scanned objects to the open-access network — created the conditions for the duplication crisis now being addressed.
Officials have set a target of reducing confirmed duplicate records by 60 percent before the end of 2026, a deadline tied to a wider digital infrastructure review scheduled for the fourth quarter. No public contract figure for the deduplication work has been disclosed by the city.
How Paris Compares to London, Amsterdam and New York
Paris is not alone, but it is operating at a different scale and under different political pressures than most peer cities. Europeana, the EU-funded pan-European digital culture aggregator headquartered in The Hague, flagged in its 2025 annual report that duplicate content across member institution feeds had reached approximately 8 percent of total indexed records — a figure that archivists describe as unsustainably high for a platform approaching 60 million objects.
London's Victoria and Albert Museum undertook a comparable deduplication exercise in 2023 and 2024, focusing on its Photography Collection, and publicly documented the process in a blog series — a level of transparency that Paris Musées has not yet matched. Amsterdam's Rijksmuseum, whose own open-access image library is widely cited as a benchmark for municipal digital collections, has operated automated deduplication as a continuous background process since 2019, preventing accumulation rather than conducting periodic purges.
New York's Metropolitan Museum of Art moved similarly, integrating deduplication into its Collections API pipeline after a 2021 audit revealed thousands of redundant entries in its open-access photography records. The common thread across these institutions is that reactive, one-time cleanups are far more expensive and disruptive than embedding the process into routine data governance from the start — a lesson Paris appears to be absorbing, if later than some of its counterparts.
For Parisians and researchers who rely on these databases, the practical advice for now is straightforward: when using the Paris Musées open-access portal, search by inventory number rather than keyword if precision matters, since keyword searches remain more vulnerable to returning duplicate or merged records during the cleanup period. The city has said the portal will display a notice when a searched collection is currently under active deduplication review. That notice system is expected to go live by September 2026.