Paris Takes a Harder Line on Duplicate Images in Public Databases Than London or Berlin
As cities race to clean up digitised heritage and housing records, Paris is betting on stricter automated auditing — but the backlog is still enormous.
As cities race to clean up digitised heritage and housing records, Paris is betting on stricter automated auditing — but the backlog is still enormous.

French archivists and municipal data managers are confronting a problem that has quietly ballooned alongside the city's post-Olympics digital acceleration: thousands of duplicate images embedded across public-facing databases, from housing permit portals to the digitised collections of the Bibliothèque nationale de France. Paris city hall acknowledged the scope of the issue in a working document circulated to the Direction de l'Urbanisme in May 2026, which flagged that duplicate or near-duplicate image files were inflating storage costs and distorting search results in at least three major municipal platforms.
The timing matters. Paris spent heavily on digital infrastructure ahead of the 2024 Olympics, migrating records, launching new resident-facing portals and digitising decades of planning files to support the ongoing Grand Paris Express construction corridor. That sprint created ideal conditions for data bloat. Files were uploaded multiple times by different agencies, resized versions were stored alongside originals without consistent naming conventions, and no unified deduplication protocol existed across arrondissement-level services.
The city's current response centres on two programs. The Atelier Parisien d'Urbanisme, known as APUR, began a structured image audit of its cartographic and photographic holdings in January 2026, targeting roughly 400,000 georeferenced images accumulated since 2018. Separately, the BnF's Gallica platform — which hosts more than eight million digitised documents accessible to the public — launched an internal deduplication initiative in late 2025 using perceptual hashing, a technique that identifies visually identical or near-identical images even when file names differ. BnF technical staff presented early results at a data governance seminar in Toulouse in March 2026, describing a first-pass duplicate rate of around 4.2 percent across specific periodical collections.
That figure, while not catastrophic, represents tens of thousands of files. For a platform that serves researchers across Europe, the consequences are real: duplicated images inflate catalogue entries, confuse metadata tagging and in some cases cause images to surface multiple times in a single search query, burying distinct results. The BnF's deduplication work is expected to continue through the end of 2026, with full integration into Gallica's indexing system targeted for the first quarter of 2027.
The Marais neighbourhood's Centre Pompidou, which manages its own digital collection separately from the BnF, has not publicly disclosed a deduplication timeline. Its IRCAM research arm, based on the Place Igor-Stravinsky, is experimenting with AI-assisted image clustering tools developed in partnership with a French tech consortium, though those trials remain at the prototype stage.
London and Berlin have moved earlier and with more standardised frameworks. The British Library completed a deduplication sweep of its digitised newspaper archive — roughly 900 million page images — in 2023 using an open-source toolkit developed with the Alan Turing Institute. The project reduced redundant storage by an estimated 11 percent and is now a reference case in European digital heritage circles. Berlin's Staatsbibliothek adopted a mandatory deduplication protocol for all new digitisation contracts from January 2024, meaning every vendor delivering scanned content must provide a hash-verified unique-image manifest before files are accepted into the central repository.
Paris has no equivalent vendor-side requirement yet. The Direction des Affaires Culturelles has been consulting on a draft standard since autumn 2025, but as of July 2026 it has not been formalised. Housing data presents an even patchier picture: the Agence Nationale de l'Habitat, which administers renovation grant applications, uses image uploads as proof of works completed, and duplicate submissions have been identified as one vector for inflated claims — a concern raised in a February 2026 parliamentary finance committee review of ANAH's Monlogement portal, though the committee did not quantify the financial exposure.
For residents and institutions navigating these systems now, the practical advice from data governance specialists is straightforward: when submitting documents to any Paris municipal portal, use a consistent file-naming convention that includes date and version number, avoid re-uploading resized copies, and check whether a portal offers a duplicate-detection warning before final submission. Several arrondissement planning desks, including those serving the 10th and 13th, have added informal guidance to their online submission pages. The broader fix, however, waits on city hall to move a draft technical standard off the shelf and into enforcement.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Paris
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News