Abonnement gratuit
The Daily Paris

Paris news, every day

News

Paris Archives and Museums Launch Urgent Push to Purge Duplicate Digitised Images From Public Collections

A coordinated audit this week exposed thousands of redundant scanned files clogging the city's cultural databases, raising questions about how public institutions manage — and spend money on — digital heritage.

By Paris News Desk · Published 4 July 2026, 8:45 pm

3 min read

Paris Archives and Museums Launch Urgent Push to Purge Duplicate Digitised Images From Public Collections
Photo: Photo by Wolf Art on Pexels
Traduction en cours…

Paris's network of publicly funded cultural institutions moved this week to confront a problem that has quietly inflated storage costs and confused researchers for years: tens of thousands of duplicate digitised images sitting inside databases that are meant to serve the public. The Paris Musées consortium, which oversees the digital collections of fourteen municipal museums including the Musée Carnavalet on the Rue des Francs-Bourgeois and the Petit Palais on the Avenue Winston Churchill, confirmed it is running a systematic audit of its open-access image library after an internal review identified a significant volume of redundant files.

The timing matters. France's national cultural digitisation programme, carried out under the broader Plan Numérique pour le Patrimoine, has poured substantial investment into making heritage images freely accessible since 2021. With Grand Paris Express construction disrupting visitor flows to peripheral museums and post-2024 Olympics tourism pressure refocusing attention on the city's cultural infrastructure, administrators are under pressure to demonstrate that earlier digitisation spending delivered clean, usable results — not bloated, duplicate-riddled archives.

What the Audit Found This Week

The practical problem is straightforward: when institutions digitised physical collections in waves — often contracting different providers at different resolutions — the same painting, print or photograph ended up uploaded multiple times under slightly different filenames or metadata tags. Paris Musées' open-access portal, which went live in 2020 and currently offers more than 300,000 images under a Creative Commons licence, has accumulated duplicate entries that skew search results and push up cloud storage costs. Librarians and photo researchers accessing the portal through Europeana, the pan-European cultural aggregator, have flagged repeated instances of the same Eugène Atget photograph of Marché des Patriarches appearing under two or three separate accession numbers.

The Bibliothèque nationale de France, whose Gallica platform hosts more than eight million digitised documents from its site on the Rue de Richelieu, faces a related but distinct version of the issue. Gallica's de-duplication protocols, last updated in 2023, rely on hash-matching technology that catches identical pixel-for-pixel copies but misses near-duplicates — the same image scanned at 300 dpi and again at 600 dpi, for instance. Technical staff confirmed this week that a remediation sprint is underway ahead of the platform's next major release, scheduled for autumn 2026.

Storage is not cheap. Commercial cloud rates for cultural-sector contracts in France currently run at roughly €0.02 per gigabyte per month for cold-tier archival storage, and high-resolution TIFF master files for a single digitised painting can run to several gigabytes each. When multiplied across hundreds of thousands of duplicated assets, the financial argument for cleaning up archives becomes hard to ignore — particularly as the French Culture Ministry faces pressure from the National Assembly's finance committee over discretionary digital spending.

What Institutions Are Doing Next

Paris Musées said its audit, conducted in partnership with the firm Ligeo Archives, which specialises in heritage data management, is expected to conclude by September 2026. Any confirmed duplicate identified in the process will be flagged for human review before deletion — a safeguard against removing files that, while visually similar, carry distinct provenance or rights metadata. The consortium is also working with the Direction des Affaires Culturelles de Paris, the city's cultural administration body on the Rue Beaubourg, to standardise filename conventions across all fourteen museums before the next digitisation contract cycle opens in early 2027.

For researchers and photo editors who regularly pull images from these collections, the practical advice this week is straightforward: cross-reference any image downloaded from Paris Musées Open Content or Gallica against the file's full metadata record, and flag suspected duplicates using the reporting function each platform provides. Both institutions say user-submitted reports have already proved useful in identifying clusters of redundant files that automated tools missed. Institutions say the cleaned databases will go live progressively rather than in a single switch-over, meaning the collections will remain publicly accessible throughout the process.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Paris

This article was produced by the The Daily Paris editorial desk and covers news in Paris. See our editorial standards for how we use AI.

The Daily Paris brief

The day's Paris news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Paris news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Paris

More in News

Enjoyed this story? Get tomorrow's briefing free.