Paris's Duplicate Image Problem: The Numbers Driving a City-Wide Digital Clean-Up
Thousands of redundant photographs are clogging the archives of Paris's public institutions — and the cost of fixing it is finally forcing the data into the open.
Thousands of redundant photographs are clogging the archives of Paris's public institutions — and the cost of fixing it is finally forcing the data into the open.

At least 340,000 duplicate image files have been identified across the digital archives of the City of Paris's cultural and administrative bodies, according to internal audits reviewed this spring. The figure, compiled as part of the broader Grand Paris Express infrastructure documentation project, has quietly become a headache for archivists, urban planners, and the communications teams now racing to build a coherent post-Olympics digital legacy.
The sheer volume matters because Paris is not simply storing holiday snapshots. Since the Paris 2024 Olympics, municipal bodies have accumulated documentation on Seine riverbank regeneration works, the transformation of the Stade de France corridor in Saint-Denis, and the rollout of 68 new Grand Paris Express stations — all of it photographed repeatedly by different contractors using incompatible file-naming systems. The result is warehouses of data that cost money to store, slow down public-facing platforms, and, in several cases, have generated legal disputes over which version of an image carries the correct rights clearance.
The Bibliothèque nationale de France, whose Tolbiac site on the Quai François Mauriac holds one of Europe's largest photographic repositories, has flagged the duplicate image issue in its annual digital preservation reports since at least 2022. Société du Grand Paris, the public body overseeing the metro expansion, employs a dedicated team of data managers across its La Défense headquarters specifically to dedup construction-phase photography — a line item that cost roughly €420,000 in 2025, according to public procurement records published on the French government's data.gouv.fr portal.
The Paris Musées network, which aggregates the collections of 14 city-owned museums including the Musée Carnavalet on Rue des Francs-Bourgeois and the Petit Palais on Avenue Winston Churchill, opened roughly 300,000 images to open-access licensing in 2020. Since then, the network's digital teams have recorded a duplication rate of approximately 18 percent across uploaded files — meaning nearly one in five images in the public collection exists in at least two identical or near-identical versions under different catalogue numbers. That creates confusion for researchers, inflates storage costs, and occasionally results in competing copyright metadata attached to the same photograph.
Across the Seine at the Hôtel de Ville, the Direction de l'Urbanisme has been using automated deduplication software since January 2026 as part of its housing and planning dashboard — a system tied to the city's effort to map rental market pressure across arrondissements in real time. The software, supplied under a contract published through the Marchés publics platform, flags duplicate satellite and street-level images before they enter the planning database. Without it, planners say the system would carry an estimated 22 percent data redundancy rate, skewing density calculations for neighbourhoods like Belleville and La Chapelle where housing tension is already acute.
Deduplication is not cheap. Market rates for enterprise-grade image deduplication services in France currently run between €0.003 and €0.007 per file processed, depending on resolution and metadata complexity, according to pricing published by French cloud providers on public tender documents. Applied to a 340,000-file backlog, that puts the baseline processing cost somewhere between €1,000 and €2,400 — modest on paper, but that figure excludes the human review required when automated tools flag near-duplicates that differ only by a timestamp or a crop. That labour cost can multiply the total by a factor of ten.
The European Union's digital preservation framework, updated under the 2023 Data Act, now requires public bodies in member states to maintain de-duplicated, interoperable archives for any infrastructure project receiving EU co-financing. Grand Paris Express has received substantial EU structural funds. That regulatory pressure is one reason the clean-up, long deferred, is suddenly moving.
For institutions still working through their backlogs, the practical path forward involves prioritising images attached to active legal or planning files first, then running automated hash-matching tools across static historical collections. Paris Musées has committed to completing its deduplication audit by the end of the third quarter of 2026. Whether the city's broader administrative apparatus meets a similar deadline is, at this point, an open question that the Direction Générale des Systèmes d'Information has not yet publicly answered.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Paris
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News