Paris Digitisation Drive Exposes Scale of Duplicate Image Problem Clogging City Archives
New figures reveal tens of thousands of redundant files are slowing the Grand Paris digital infrastructure project and costing municipal departments real money.
New figures reveal tens of thousands of redundant files are slowing the Grand Paris digital infrastructure project and costing municipal departments real money.

Paris city archives hold more than 4.2 million digitised images across municipal departments — and administrators now estimate that somewhere between 12 and 18 percent of those files are exact or near-exact duplicates, according to internal assessments circulated to the Direction de la Mémoire, du Patrimoine et de l'Architecture earlier this year. That is potentially 750,000 redundant image files consuming server capacity, distorting search results and complicating the legacy data work tied to the post-Olympics Grand Paris digital infrastructure push.
The timing matters. Since the Paris 2024 Games concluded, the city has accelerated a broad digitisation programme intended to make public records, planning documents and cultural heritage materials accessible through a unified online portal. The Bibliothèque historique de la Ville de Paris on rue de Rivoli and the Archives de Paris on boulevard Sérurier are both feeding tens of thousands of scanned documents into that system monthly. Duplicate images were always a latent problem in analogue archives. In a shared digital environment, they become an active obstruction — clogging pipelines, inflating storage costs and returning false positives in public-facing searches.
Storage is not cheap. Municipal cloud and hybrid server contracts for Île-de-France public bodies have risen sharply since 2022, with indicative market rates for managed archival storage running between €0.018 and €0.024 per gigabyte per month for the volume tiers relevant to a city administration. A conservative estimate, applied to the duplicate file load flagged in those internal assessments, points to wasted expenditure in the low six figures annually — before accounting for staff hours spent manually reconciling records.
The Grand Paris Express metro project has its own documentation problem that mirrors the broader municipal one. Construction documentation for the eighteen new stations currently under active build — including those serving Saint-Denis Pleyel and Villejuif-Institut Gustave Roussy — runs to hundreds of thousands of engineering images, site photographs and compliance scans. Project managers working across Société du Grand Paris have flagged duplicate imagery as a quality-control risk, since outdated or replicated site photographs can be mistakenly referenced during inspection sign-offs. Société du Grand Paris has not published a specific figure for its duplicate file rate, but the issue was identified as a workflow priority in their 2025 operational review cycle.
Deduplication software has existed for years, but municipal procurement cycles lag commercial ones by a significant margin. The standard tool in French public-sector digitisation contracts has historically been manual spot-checking supplemented by basic hash-matching — a method that catches identical files but misses near-duplicates, where a scan has been cropped, recompressed or lightly annotated. More sophisticated perceptual hashing and AI-assisted image comparison tools, widely deployed in commercial media archives since around 2023, are only now entering the procurement conversation at the Direction des Systèmes et Technologies de l'Information de la Ville de Paris.
The practical path forward involves three overlapping steps that several arrondissement-level cultural services are already piloting in a limited way. First, automated deduplication must run against existing collections before any new batch is ingested — not after. Second, metadata standards need harmonising across the Archives de Paris, the Bibliothèque historique and the smaller municipal museum documentation offices clustered around the Marais and the 13th arrondissement's redeveloped riverfront. Third, staff training needs to catch up: archivists who spent careers managing physical collections are being asked to apply digital asset management logic without always having received structured retraining.
A tender for a city-wide digital asset management platform is expected to be published through the Marchés Publics portal before the end of the third quarter of 2026. Whether that contract will include mandatory deduplication tooling as a baseline requirement — rather than an optional module — will determine whether Paris's digitisation ambitions produce a genuinely searchable public record or simply a very large, very expensive drawer full of near-identical photographs of the same Haussmann façades.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Paris
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News