Abonnement gratuit
The Daily Paris

Paris news, every day

News

Paris Takes a Hard Look at Duplicate Images in Its Public Archive — and Finds It's Behind London and Amsterdam

As cities race to digitise their visual heritage, Paris is grappling with a sprawling problem of redundant imagery that is slowing archival access and costing institutions real money.

By Paris News Desk · Published 4 July 2026, 8:44 pm

3 min read

Paris Takes a Hard Look at Duplicate Images in Its Public Archive — and Finds It's Behind London and Amsterdam
Photo: Photo by Sergey Guk on Pexels
Traduction en cours…

The Bibliothèque nationale de France holds more than 15 million digitised images in its Gallica platform. A growing share of them, archivists have acknowledged internally, are exact or near-exact duplicates — the same Haussmann-era façade scanned twice, the same 1900 Exposition Universelle print catalogued under three separate accession numbers. The duplication problem is not a quirk; it is a structural consequence of how Paris absorbed decades of piecemeal digitisation projects funded by successive municipal and national programmes that rarely talked to each other.

The issue is pressing right now for a specific reason. The post-Olympics legacy agenda — formally tied to Paris 2024's commitment to expanding cultural access through the Grand Paris infrastructure build — has pushed the city to consolidate its digital public assets before the next wave of European cultural funding arrives in early 2027. Duplicates in image databases inflate storage costs, break search algorithms, and most critically, push legitimate archival material further from public reach. For institutions applying for Horizon Europe grants, a bloated and unreliable image catalogue is a liability they can no longer quietly absorb.

What Paris Is Doing — and Where It Falls Short

Two organisations are leading the practical response. Paris Musées, the federation that runs 14 municipal museums including the Musée Carnavalet on Rue des Francs-Bourgeois and the Petit Palais on Avenue Winston Churchill, launched a deduplication audit in the spring of 2025 covering its open-access image collection of roughly 300,000 items. The project uses perceptual hashing — a technique that generates a fingerprint for each image based on visual content rather than file metadata — to flag near-duplicates for human review. The BnF, separately, has been piloting a comparable tool within the Gallica infrastructure since late 2024, though the two institutions are not yet sharing methodology or outputs in any formal way.

That fragmentation is precisely where Paris diverges from cities further along the curve. Amsterdam's Rijksmuseum digitised its collection under a single unified rights and metadata framework, meaning its 700,000-plus public domain images were deduplicated at the point of ingestion rather than retrospectively. The result: a search for a Rembrandt study on the Rijksstudio returns clean, distinct results with no redundant entries. The British Museum in London, drawing on its Collection Online platform, completed a metadata harmonisation project in 2023 that cut duplicate image records by an estimated 12 percent across its 1.9 million object entries. Paris has no equivalent cross-institution harmonisation date on the public calendar.

The cost gap matters. Cloud storage for cultural institutions in France runs at roughly €0.02 per gigabyte per month under standard government-negotiated contracts — negligible at small scale, but significant when you are maintaining redundant high-resolution TIFF files across multiple servers in Tolbiac and the BnF's satellite depot in Bussy-Saint-Georges, Seine-et-Marne. Duplicates do not just waste storage; they require ongoing cataloguing labour to maintain incorrect or conflicting metadata fields.

Lessons From Abroad, Applied to the Marais and Beyond

The comparison with New York's Metropolitan Museum of Art is instructive. The Met opened its entire open-access image collection — some 490,000 works — in 2017 and built deduplication into the export pipeline from the start, assigning each object a persistent digital identifier that travels with every derivative image file. Paris Musées began assigning persistent identifiers to its objects only in 2022, meaning years of earlier digitisation output exists outside that system and must be reconciled manually.

For researchers and designers who use these archives — freelancers pulling historical maps of the 11th arrondissement, editorial photographers checking rights status on Belle Époque portraits — the practical effect is wasted hours and occasional mislicensing. Creative Commons advocacy groups operating in France have flagged the issue to the Direction générale des médias et des industries culturelles, though no formal regulatory response has been published.

The next concrete milestone comes in September 2026, when Paris Musées is scheduled to publish the first phase of its deduplication audit results. Whether the BnF coordinates a parallel release — or continues operating on its own timeline — will signal how seriously the city's cultural institutions have absorbed the lesson that Amsterdam and London learned earlier: cleaning the archive is not a housekeeping task. It is infrastructure.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Paris

This article was produced by the The Daily Paris editorial desk and covers news in Paris. See our editorial standards for how we use AI.

The Daily Paris brief

The day's Paris news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Paris news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Paris

More in News

Enjoyed this story? Get tomorrow's briefing free.