Abonnement gratuit
The Daily Paris

Paris news, every day

News

Paris Archives Battle Thousands of Duplicate Images Flooding Digital Databases

A surge in digitised duplicate photographs is clogging public databases and forcing Parisian institutions to act fast.

By Paris News Desk · Published 4 July 2026, 8:36 pm

4 min read

Paris Archives Battle Thousands of Duplicate Images Flooding Digital Databases
Photo: Photo by Serhii Kovalov on Pexels
Traduction en cours…

Paris's major cultural archives are confronting a growing technical headache this week: thousands of duplicate images have accumulated across publicly accessible digital collections, undermining search reliability and inflating storage costs at a moment when institutions are under pressure to make their holdings accessible for free. The Bibliothèque nationale de France and the Paris Musées network — which pools the digital collections of 14 municipal museums — have both confirmed they are actively running deduplication processes on their online databases, according to internal communications reviewed by staff at those institutions.

The problem has been building for at least two years. The post-Olympics digitisation push that followed Paris 2024 injected tens of thousands of new image files into public repositories almost simultaneously. Institutions racing to upload photographs of legacy events, urban regeneration projects along the Seine, and Grand Paris Express construction sites often scanned the same source material multiple times using different equipment, creating near-identical copies that standard metadata tagging failed to catch. The result: a researcher querying the BnF's Gallica platform today can encounter the same 19th-century engraving of the Pont Neuf listed under four separate catalogue entries.

Where the Duplicates Are Piling Up

The problem is not confined to historical material. Paris Musées, whose digital portal at parismusees.paris.fr hosts more than 300,000 open-licence images, identified roughly 8,400 suspected duplicate entries in an internal audit completed in late June 2026. The Musée Carnavalet — the city's museum of history, located on Rue de Sévigné in the Marais — contributed the largest share of suspect files, largely because its photographic holdings covering Haussmann-era demolitions were digitised in three separate project phases between 2019 and 2024.

At the BnF's Richelieu site on Rue de Richelieu in the 2nd arrondissement, technical staff have been running perceptual hashing algorithms since the week of June 23 to compare image fingerprints across Gallica's 9 million-item catalogue. Perceptual hashing works by converting each image into a compact numerical signature; files with near-identical signatures are flagged for human review rather than automatically deleted, because archivists argue that minor differences — a crop, a contrast adjustment — can carry documentary significance.

The financial stakes are real. Cloud storage costs for the BnF's digital division have risen by an estimated 18 percent year-on-year since 2023, according to figures cited in the institution's 2025 annual report. Deduplication, if completed across the full Gallica catalogue, could reclaim storage equivalent to several hundred terabytes — a meaningful saving at a time when the culture ministry's digital infrastructure budget faces pressure from National Assembly spending reviews.

What Researchers and the Public Should Expect

For anyone using these platforms in the coming weeks, there will be visible disruption. Paris Musées has already temporarily withdrawn around 1,200 image records from public view while its team resolves conflicting metadata between duplicate pairs. The Carnavalet images most affected include photographs of the Marais district taken between 1860 and 1900 — precisely the material most in demand from architectural researchers and documentary filmmakers working on Seine-Saint-Denis urban history projects.

The BnF is taking a slower, staged approach. Rather than pulling records offline, it is appending a visible warning flag — described internally as a "doublon suspect" tag — to entries under review. Users searching Gallica from July 7 onward will see those flags appear in search results alongside affected items. The institution expects the first phase of human review to be complete by September 1, 2026, with a full resolution timeline running into early 2027.

For independent researchers and heritage professionals, the practical advice is straightforward: cross-reference any image sourced from Gallica or Paris Musées this summer against at least one secondary catalogue — the Roger-Viollet photographic agency archive or the Institut national de l'audiovisuel's image base are both considered reliable independent sources. Institutions building digital exhibitions or publications should download and locally preserve any specific image they are counting on, since flagged records carry a small but non-zero risk of temporary withdrawal during the review period.

The episode is prompting wider discussion inside the Grand Paris cultural sector about whether the rush to digitise during and after the 2024 Olympics was adequately resourced for quality control — a question that budget committees at both city hall and the culture ministry will likely be asked to address before the next major digitisation tender is issued.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Paris

This article was produced by the The Daily Paris editorial desk and covers news in Paris. See our editorial standards for how we use AI.

The Daily Paris brief

The day's Paris news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Paris news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Paris

More in News

Enjoyed this story? Get tomorrow's briefing free.