Abonnement gratuit
The Daily Paris

Paris news, every day

News

Paris Archives in Numbers: The Hidden Scale of the Duplicate Image Problem

Thousands of redundant photographs are clogging municipal databases across the capital, and the cost of cleaning them up is proving harder to quantify than the mess itself.

By Paris News Desk · Published 4 July 2026, 9:16 pm

3 min read

Paris Archives in Numbers: The Hidden Scale of the Duplicate Image Problem
Photo: Transylvania University. [from old catalog] Green, Lewis Warner, 1806-1963. [from old catalog] / Public domain (Wikimedia Commons)
Traduction en cours…

City archivists and digital records managers across Paris have flagged a measurable but largely invisible crisis: duplicate images now account for a significant share of storage across municipal photo databases, costing public institutions money and slowing access to the documents that residents and researchers rely on every day. Estimates circulating within the Direction des Affaires Culturelles de Paris — the city body that oversees heritage digitisation — suggest that duplicate or near-duplicate image files make up between 20 and 35 percent of total stored assets in some departmental collections, though those figures vary sharply depending on the institution and how aggressively data hygiene has been applied.

The timing matters. Paris is three years into a major push to digitise its urban memory, partly in response to the extraordinary visual archive generated around the 2024 Olympics and the continuing Seine riverside regeneration stretching from Bercy to the Bois de Boulogne. Millions of photographs, scans, drone footage stills and architectural drawings have been ingested into centralised systems since 2023. That volume, handled quickly and often by multiple teams working in parallel, is precisely the environment in which duplication compounds fastest.

What the Numbers Actually Show

The Bibliothèque historique de la Ville de Paris, based on the Rue de Rivoli in the 4th arrondissement, completed an internal audit of its digitised photographic collection in early 2026. The audit found several thousand redundant entries in its online catalogue — images uploaded more than once under different metadata tags, or scanned in separate batches during the Covid-era rush to make collections remotely accessible. Storage costs for municipal archives in the Île-de-France region have risen sharply: server space for cultural institutions in the Grand Paris area averaged roughly €18 per terabyte per month in cloud-hosting contracts signed during 2024 and 2025, according to procurement documents reviewed by public procurement transparency registers. When redundant files account for a quarter of stored volume, the wasted spend across dozens of institutions adds up fast.

The problem extends beyond the historic archives. Paris Musées, the consortium that manages 14 municipal museums including the Musée Carnavalet and the Petit Palais, launched an open-data image portal in 2021 that now hosts more than 300,000 freely downloadable works. Administrators working on that platform have acknowledged, in public documentation published on data.gouv.fr, that deduplication remains an ongoing technical challenge — particularly for items donated or transferred from satellite collections that had already been partially digitised by third parties.

The Deduplication Challenge in Practice

Automated deduplication tools — software that compares image hash values or uses perceptual matching algorithms — can catch exact duplicates reliably, but they struggle with near-duplicates: two scans of the same 19th-century map of the Marais taken at different resolutions, or two photographs of the Pont de Bir-Hakeim shot seconds apart by different photographers during the same press event. Those near-duplicates require human review, and human review costs time and specialist labour that most municipal archives budget for only marginally.

The Grand Paris Express metro project, now in its final construction phases across the outer suburbs, has generated its own parallel image-duplication headache. Project documentation photography — conducted by multiple contractors across worksites in Saint-Denis, Vitry-sur-Seine and Champigny-sur-Marne — has produced overlapping visual records sitting across at least three separate project management platforms, according to procurement summaries published by Société du Grand Paris.

For institutions trying to address the backlog, the practical path forward involves a phased approach: automated hash-matching first, applied across entire collections to eliminate exact duplicates with minimal risk; followed by perceptual-similarity scanning to flag probable near-duplicates for human triage; and finally, a metadata standardisation pass to prevent the conditions that created the problem recurring. Several arrondissement-level mairies have already contracted with specialist data firms based in the 13th arrondissement's tech cluster around the Olympiades district to begin that work. The price per 100,000-image batch for a full deduplication audit was quoted in the range of €8,000 to €15,000 in tender responses filed with the city procurement office earlier this year — a figure that will concentrate minds among budget managers already squeezed by National Assembly pressure on municipal spending.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Paris

This article was produced by the The Daily Paris editorial desk and covers news in Paris. See our editorial standards for how we use AI.

The Daily Paris brief

The day's Paris news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Paris news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Paris

More in News

Enjoyed this story? Get tomorrow's briefing free.