Three of Paris's largest publicly funded image repositories moved this week to address a growing backlog of duplicate and misattributed photographs cluttering their digital archives, a problem that has quietly undermined public access to the city's visual heritage for years. The Bibliothèque nationale de France, the Musée Carnavalet on the Rue des Francs-Bourgeois, and the Paris Musées network all confirmed ongoing deduplication reviews, following a directive issued earlier this year under the European Union's revised digital cultural heritage framework.
The timing is not accidental. The EU's updated guidelines — part of the broader Europeana strategy for harmonising public collections across member states — set a soft compliance benchmark of September 2026 for institutions receiving public digitisation funding. Paris's cultural bodies, many of which received significant investment during the post-2024 Olympics legacy activation programme, are under pressure to show their digital infrastructure matches the ambition of their physical renovations.
What the Backlog Actually Looks Like
The scale of the problem is considerable. The Paris Musées network, which manages 14 municipal museums including the Petit Palais on Avenue Winston Churchill and the Musée d'Art Moderne on Avenue de New York, made roughly 300,000 digitised works freely available online under an open-licence policy introduced in 2020. That move was widely praised, but it also exposed a pre-existing mess: thousands of images uploaded multiple times under different catalogue numbers, with inconsistent date tags and conflicting attribution notes.
Deduplication in large image databases is not simply a matter of deleting obvious copies. The technical process involves perceptual hashing — software that detects near-identical images even when file sizes or compression levels differ — combined with manual curatorial review to confirm whether two similar images are genuinely the same photograph or distinct takes from the same session. At the BnF's Richelieu site on Rue de Richelieu, a dedicated digital team has been running this process on the Gallica platform since March, targeting the photography collections first before moving to maps and printed ephemera.
Industry estimates suggest that large heritage image databases typically carry a duplicate rate of between 8 and 15 percent, though the figure varies sharply depending on how aggressively collections were migrated from older systems. For a collection the size of Gallica — which holds more than 9 million digitised documents — even a conservative 10 percent duplication rate represents hundreds of thousands of redundant files consuming server space and degrading search results.
Why It Matters Beyond Tidiness
The practical consequences are real and affect ordinary users. Researchers at Sciences Po, teachers pulling images for school resources, and journalists searching the open collections all run into the same frustration: a search for, say, photographs of the Marais district in the 1950s returns a wall of near-identical thumbnails, making it hard to find genuinely distinct images. Worse, when the same image carries two different catalogue entries with conflicting dates or photographer credits, it creates downstream errors in academic citations and press usage.
The Seine-Saint-Denis archives, which hold significant documentation of the northern banlieues and have been expanding their digital presence as part of the Grand Paris Express corridor development publicity, face a related but distinct problem: images digitised by different local communes before regional coordination existed, meaning the same photograph sometimes appears in three separate municipal databases with three different captions.
For anyone working with these collections now, the practical advice is straightforward. Cross-reference any image found through a Paris Musées or Gallica search against the Joconde national database before publishing or citing it — discrepancies in attribution are still common and the deduplication work will not be complete before autumn. Institutions expecting to use archival images for publications scheduled before October should build in extra verification time. The Paris Musées network has indicated it will publish a revised metadata guide later this month, which should at minimum clarify the preferred citation format once duplicate records are merged.