Abonnement gratuit
The Daily Paris

Paris news, every day

News

Paris Archives in Crisis: What Officials, Experts and Key Figures Are Saying About the Duplicate Image Problem

From the Bibliothèque nationale de France to the Hôtel de Ville, digitisation professionals are sounding the alarm over thousands of mislabelled and duplicated images clogging public heritage databases.

By Paris News Desk · Published 4 July 2026, 8:47 pm

3 min read

Paris Archives in Crisis: What Officials, Experts and Key Figures Are Saying About the Duplicate Image Problem
Photo: Photo by amine photographe on Pexels
Traduction en cours…

Paris's public heritage institutions are sitting on a quietly expanding problem. Across several major digitisation programmes, duplicate and misattributed images have been accumulating in shared databases for years — and the reckoning is arriving now, as the city prepares to open new archive portals tied to the post-Olympics urban legacy push along the Seine.

The issue surfaced most visibly this spring, when the Bibliothèque nationale de France flagged internal inconsistencies in its Gallica platform, which hosts more than nine million digitised documents. Technical staff identified significant clusters of duplicate image records — in some cases the same photograph catalogued under three or four separate identifiers — that were distorting search results and inflating collection counts. The BnF has not published a formal error report, but the problem is an open subject at heritage sector conferences in the 11th and 13th arrondissements, where archivists and digital humanities scholars have spent recent weeks debating both causes and fixes.

Why the problem is worse than it looks

Duplicate images are not just a tidiness issue. When public institutions share datasets — as Paris's mairies, the Musée Carnavalet on the Rue des Francs-Bourgeois, and the Direction régionale des affaires culturelles increasingly do under Grand Paris interoperability frameworks — a single duplicated record can propagate across multiple systems simultaneously. A photograph of a Haussmann-era façade in the 9th arrondissement, entered twice under slightly different metadata, effectively becomes two competing authoritative records. Researchers, journalists and urban planners pulling from those databases downstream have no way to know which version carries the correct date or attribution.

Specialists in the field point to a structural cause: the acceleration of digitisation during and after the Paris 2024 Olympic period, when several arrondissement-level cultural institutions received emergency funding to get collections online quickly. Speed and quality control made poor partners. One widely cited estimate within the sector, drawn from a 2025 audit commissioned by the Île-de-France region, suggested that between 12 and 18 percent of images ingested during accelerated digitisation rounds contained some form of duplicate or metadata error — though that figure has not been independently verified by The Daily Paris.

The Musée Carnavalet, which relaunched its online collections portal in 2023 after a major renovation, is considered something of a local benchmark for getting the process right. Its curatorial team built deduplication checks directly into the ingest workflow. But smaller institutions — a mairie annexe in the 19th, a local history society in Vincennes — lack the staffing to replicate that approach.

What needs to happen, and who says so

The conversation has shifted from diagnosis to prescription. The Agence nationale de la cohésion des territoires, which oversees digital infrastructure grants for local authorities, is now understood to be reviewing whether future funding rounds for heritage digitisation should require applicants to submit a deduplication protocol before money is released. No formal policy change has been announced as of July 4, 2026, but the direction is being discussed openly at the sector level.

Separately, Sciences Po's médialab in the 7th arrondissement has been developing open-source tooling designed specifically to identify near-duplicate images in archival collections — software that can flag visually similar images even when their metadata differs. Researchers there have been in contact with the BnF and with Paris Musées, the umbrella body that coordinates the collections of 14 city-owned museums. Whether those conversations produce a formal partnership is still unclear.

For institutions managing their own collections today, practitioners advise three immediate steps: audit ingestion logs going back to 2020, apply perceptual hashing tools to existing image libraries to surface visual duplicates, and freeze any new cross-institutional data-sharing until a single metadata standard is agreed. The cost of doing nothing compounds. Every new portal built on a dirty dataset exports the problem further down the chain — and in a city that has staked significant cultural and economic capital on making its archives publicly accessible, that is a risk no one in the sector wants to be responsible for.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Paris

This article was produced by the The Daily Paris editorial desk and covers news in Paris. See our editorial standards for how we use AI.

The Daily Paris brief

The day's Paris news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Paris news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Paris

More in News

Enjoyed this story? Get tomorrow's briefing free.