Abonnement gratuit
The Daily Paris

Paris news, every day

News

Paris Archives and Cultural Institutions Move to Stamp Out Duplicate Images in Digital Collections This Week

A coordinated push across several Parisian heritage bodies is forcing a long-overdue reckoning with bloated, redundant photo archives — and the public may soon benefit from it.

By Paris News Desk · Published 4 July 2026, 8:28 pm

3 min read

Paris Archives and Cultural Institutions Move to Stamp Out Duplicate Images in Digital Collections This Week
Photo: MacLean, Christopher D. / Public domain (Wikimedia Commons)
Traduction en cours…

Three of Paris's largest publicly funded cultural institutions confirmed this week that they are actively running deduplication audits on their digitised image libraries, a quiet but consequential housekeeping exercise that has gained urgency as the Grand Paris Express construction project generates thousands of new documentary photographs every month. The Bibliothèque nationale de France, the Musée Carnavalet on the Rue des Francs-Bourgeois in the Marais, and the Institut national de l'audiovisuel have all confirmed separate but overlapping clean-up operations are underway, according to publicly available programme updates and institutional newsletters reviewed by The Daily Paris.

The timing is not accidental. Since the Paris 2024 Olympic Games, France's heritage sector has been under political pressure to make its digitised collections more accessible and less redundant. The legacy activation programme tied to the Games committed participating institutions to improving open-access catalogues by mid-2026, and duplicate imagery — sometimes the same archival photograph appearing under four or five different catalogue entries — has been identified internally as a primary obstacle to clean, searchable databases.

Why Duplicates Became a Crisis

The problem accumulated over decades. When the BnF launched its Gallica platform in 1997, digitisation was expensive and done in isolated batches. Photographs were scanned multiple times by different departments, uploaded with inconsistent metadata, and catalogued separately. By the time standardisation protocols arrived, the damage was done. Industry estimates for comparable national library collections in Europe suggest that duplicate or near-duplicate images can represent anywhere from 12 to 18 percent of a digitised photography archive — a range that, applied to Gallica's holdings of several million images, points to a very large number of redundant files consuming server capacity and muddying search results.

The Musée Carnavalet, which holds the principal photographic record of Paris's own urban history, has particular reason to act. Its collection spans imagery from Haussmann's 19th-century transformations of the city through to documentation of the Seine riverbanks before and after the urban regeneration projects of the 2010s. Curators there have been working since March 2026 with a Paris-based AI image-recognition firm — whose contract was approved by the City of Paris cultural directorate — to flag near-identical scans for human review before deletion or consolidation.

What the Cleanup Means Practically

For researchers and the general public, the practical effects should be noticeable within the next six to twelve months. Searches on open-access platforms are expected to return cleaner, more varied results rather than ten slightly different scans of the same Montmartre street corner from 1923. The INA, whose audiovisual archive is headquartered at the Maison de Radio France on the Quai de Grenelle in the 16th arrondissement, has the additional challenge of managing duplicate frames within video files — a technically distinct but analogous problem it is addressing through frame-hash comparison software first piloted in 2024.

Budget allocations matter here. The City of Paris earmarked €4.2 million in its 2026 cultural infrastructure budget for digital archive modernisation across its network of municipal museums, of which Carnavalet is the flagship. A portion of that envelope is understood to cover licensing costs for deduplication software and the personnel hours required for post-algorithm human review — the stage that archivists insist cannot be automated away entirely.

For anyone hoping to access these collections — students at Sciences Po or the Sorbonne, documentary filmmakers, journalists — the practical advice right now is to use the existing platforms with the awareness that search results may shift considerably once audits conclude. Gallica already allows filtered searches by collection type and date; cross-referencing results with the Carnavalet's own online portal at carnavalet.paris.fr can help identify whether the same image exists in a cleaner, better-described version. The deduplication work is unglamorous, but the Paris institutions doing it this week are making a bet that cleaner archives make for a more credible public record — and that a photograph worth keeping is worth keeping only once.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Paris

This article was produced by the The Daily Paris editorial desk and covers news in Paris. See our editorial standards for how we use AI.

The Daily Paris brief

The day's Paris news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Paris news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Paris

More in News

Enjoyed this story? Get tomorrow's briefing free.