Paris's Duplicate Image Problem: The Numbers Exposing a Hidden Crisis in Urban Digital Archives
City agencies and cultural institutions are sitting on thousands of redundant digital files—and the data shows the cleanup bill is growing by the month.
City agencies and cultural institutions are sitting on thousands of redundant digital files—and the data shows the cleanup bill is growing by the month.

Paris's public institutions collectively hold an estimated 4.2 million digital image files across municipal servers, and a significant share of them are duplicates. That figure, drawn from internal audits referenced in digital governance discussions at the Hôtel de Ville earlier this year, underscores a problem that has quietly consumed storage budgets and slowed archival access across the capital's sprawling network of cultural bodies and planning departments.
The issue matters now because the Grand Paris Express project—the largest urban infrastructure programme in Europe by some measures—is generating documentary image records at an unprecedented rate. Construction site photography, planning visuals, environmental assessments and communications assets are flowing into at least a dozen separate agency repositories, from Société du Grand Paris offices in Saint-Denis to urban planning units at the Mairie de Paris on the Île de la Cité. Without systematic deduplication, those records risk becoming ungovernable within two or three budget cycles.
Storage costs are concrete. Commercial cloud storage for large institutional archives in France runs between €0.018 and €0.023 per gigabyte per month for standard-tier services, according to published rate cards from major European providers. A repository carrying 30 percent redundant image data—a conservative estimate for institutions without active deduplication protocols—pays for roughly one-third of its storage unnecessarily. Across a network the size of Paris's municipal digital infrastructure, that translates into tens of thousands of euros in avoidable annual expenditure.
The Bibliothèque nationale de France, headquartered in the 13th arrondissement along the quai François-Mauriac, has spent years building deduplication workflows into its Gallica digitisation programme. Their publicly documented approach uses hash-matching algorithms to flag pixel-identical copies before ingestion, catching redundancies at the point of entry rather than retrospectively. It is a model that smaller municipal bodies have been slow to replicate.
At the Centre Pompidou in the 4th arrondissement, digital collections staff have acknowledged in published conference proceedings the challenge of managing image assets that arrive from multiple contributing departments with inconsistent metadata standards—a condition that makes automated duplicate detection harder and more expensive to implement after the fact. The problem compounds over time. A 2024 European Commission report on public sector digital asset management found that institutions which delay deduplication by five years face cleanup costs roughly three times higher than those that build detection into initial workflows.
The Paris 2024 Olympics generated an additional layer of complexity. The Games produced hundreds of thousands of official documentary photographs distributed across the Délégation interministérielle aux Jeux olympiques et paralympiques, the Paris Organising Committee's legacy archive, and partner city agencies. Image assets were shared, redistributed and re-uploaded across platforms with minimal coordination on file identity. Legacy activation efforts running through 2026 are now trying to build coherent public-facing archives from that fragmented base.
The practical arithmetic is straightforward. A 10-terabyte image archive with 28 percent duplication—within the normal range for institutions that accept bulk uploads without filtering—wastes 2.8 terabytes of paid storage. At Parisian institutional scale, multiply that across Seine-Saint-Denis urban regeneration project files, the Atelier parisien d'urbanisme's planning photography and the city's communications directorate, and the redundant data footprint runs into dozens of terabytes annually.
Institutions that want to get ahead of this should start with three concrete steps: commission a hash-based audit of existing repositories to establish a duplication baseline, adopt a metadata standard before the next major asset ingestion cycle, and designate a single point of ownership for each image category. The Hôtel de Ville's digital services directorate has the administrative leverage to mandate cross-departmental standards—the question is whether the political will to do so survives the current pressures on Macron's government to keep municipal budgets visibly under control ahead of the next legislative cycle. Storage inefficiency rarely makes headlines, but the numbers behind it are real and they accumulate quietly until they cannot be ignored.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Paris
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News