Abonnement gratuit
The Daily Paris

Paris news, every day

News

How Paris Ended Up With Thousands of Duplicate Images Clogging Its Public Archives — and What's Being Done About It

A sprawling digitisation push launched ahead of the 2024 Olympics left the city's photo repositories riddled with near-identical files, and archivists are only now beginning to untangle the mess.

By Paris News Desk · Published 4 July 2026, 9:06 pm

3 min read

How Paris Ended Up With Thousands of Duplicate Images Clogging Its Public Archives — and What's Being Done About It
Photo: Photo by Paul on Pexels
Traduction en cours…

Paris city hall has acknowledged a systemic problem inside its digital asset management infrastructure: tens of thousands of duplicate image files, many generated during a rushed digitisation campaign between 2022 and 2024, are consuming server capacity and distorting search results across at least three major municipal platforms.

The timing matters. The city spent heavily on digital modernisation in the run-up to the Paris 2024 Olympics, accelerating the ingestion of photographic archives from institutions including the Bibliothèque historique de la Ville de Paris on Rue de Rivoli and the Paris Musées network, which groups 14 municipal museums under a single open-access licence. The pressure to have assets publicly searchable before the global spotlight arrived was real — and it produced shortcuts that are still being unwound today, in the summer of 2026.

A Problem Built in Stages

The duplication issue did not appear overnight. It accumulated across three distinct phases. First, legacy digitisation batches carried out before 2020 used inconsistent metadata schemas, meaning the same photograph could be stored under different file names with no automated flag to catch the overlap. Second, the Olympic-era import sprint — which pulled images from the Département de la Seine archives and several arrondissement-level cultural centres, including the Mairie du 10e on Rue de la Grange-aux-Belles — imported material without a mandatory deduplication check at the point of entry. Third, contractors working across the Grand Paris Express infrastructure documentation project inadvertently fed construction-phase photographs into the general municipal repository rather than a segregated project folder, adding a further layer of redundancy.

By early 2025, internal audits conducted by the Direction des Affaires Culturelles — the city body that oversees municipal cultural policy — had identified the scale of the backlog. The DAC does not publish granular operational data, but the Paris Musées open-access portal, which went live in 2020 and now hosts more than 300,000 images free of copyright restriction, had begun returning visibly degraded search results, with users encountering the same photograph multiple times under different catalogue entries.

Untangling the Archive

The practical consequences extend beyond inconvenience. Journalists, researchers and architecture firms using the Seine-Saint-Denis urban regeneration documentation — one of the legacy programmes tied to the post-Olympics transformation of the Plaine-Saint-Denis district — reported difficulty distinguishing authoritative image versions from low-resolution duplicates. For a city aggressively marketing itself as a model of open public data, the duplication problem carried reputational weight.

Since January 2026, the DAC has been running a structured replacement programme. The approach involves hash-based fingerprinting to identify exact and near-exact duplicates, followed by a manual review tier for images where automated matching falls below a confidence threshold. Priority has been given to the Paris Musées portal and the municipal urban planning image bank maintained by the Atelier Parisien d'Urbanisme, known as APUR, on Avenue de Flandre in the 19th arrondissement. APUR's holdings are particularly consequential because they feed into planning consultation documents used by local residents challenging or supporting development projects across the Grand Paris perimeter.

The deduplication work is expensive. Municipal procurement records published on the Marchés Publics platform show a contract awarded in November 2025 valued at €1.4 million over 18 months to a technical services provider for archival data remediation across city-managed repositories — though the contract covers a broader scope than image deduplication alone.

For anyone relying on these archives in the near term — researchers at Sciences Po, journalists working on banlieue housing stories, architects pulling historical facades for renovation permits — the practical advice is straightforward: cross-reference any image pulled from the Paris Musées portal or the APUR library against the source institution's own catalogue before publishing or submitting. The deduplication sweep is expected to complete its first priority phase by December 2026, but until that work is certified finished, the repositories remain partially unreliable. The city has said updated metadata standards will apply to all new ingestions from that point forward.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Paris

This article was produced by the The Daily Paris editorial desk and covers news in Paris. See our editorial standards for how we use AI.

The Daily Paris brief

The day's Paris news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Paris news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Paris

More in News

Enjoyed this story? Get tomorrow's briefing free.