Paris's municipal archive authority, the Direction des Affaires Culturelles, is currently running a structured audit of its public-facing image repositories, targeting thousands of duplicate photographs and scanned documents that have accumulated across city databases since a major digitisation push began in 2019. The effort, centred on the Bibliothèque historique de la Ville de Paris on Rue de Rivoli and the city's open-data portal Paris Data, aims to reduce storage redundancy and improve search accuracy for researchers and the general public alike. Seven years into the broader digitisation project, the duplicate problem has become too large to ignore.
The timing matters for several reasons. Paris 2024 Olympics legacy commitments include a pledge to make cultural and historical assets more accessible online, and duplicated image entries directly undermine that goal by cluttering search results and misleading users. Meanwhile, the Grand Paris Express construction project is generating fresh waves of archival photography — site surveys, heritage impact assessments, community documentation — that risk compounding existing redundancy if clean-up protocols are not standardised quickly. The Seine-Saint-Denis département alone has added thousands of images to shared municipal servers since tunnelling work accelerated in 2024.
The scale of the problem became clearer last autumn, when an internal review by Paris Data found that roughly one in six image files in the publicly searchable urban heritage collection existed in at least two versions — different resolutions, slightly different file names, or uploads from separate departmental sources without cross-referencing. The Médiathèque de l'Architecture et du Patrimoine, based in Charenton-le-Pont just south-east of the Périphérique, has been piloting an automated hash-matching tool since January 2026 to flag near-identical files before they are published. Progress is real but uneven: the Charenton pilot covers heritage photography but does not yet connect to the broader Paris Data infrastructure.
What Amsterdam and Seoul Are Doing Differently
The contrast with Amsterdam is instructive. The Stadsarchief Amsterdam completed a comparable deduplication exercise across its 750,000-image online collection in 2023, using a combination of perceptual hashing and metadata reconciliation. The Dutch city built the protocol directly into its upload pipeline, meaning new files are screened before publication rather than audited after the fact. Seoul's Metropolitan Library applied a similar prevention-first model when it relaunched its digital portal in early 2025, integrating duplicate detection at the point of institutional submission from each of the city's 25 autonomous gu districts. Neither city has eliminated the problem entirely, but both have reduced post-publication duplication rates to below two percent of total holdings, according to publicly available annual reports from each institution.
Paris is not starting from scratch. The city's open-data team has implemented basic metadata deduplication on datasets other than images for several years, and the Direction des Systèmes et Technologies de l'Information manages centralised storage that, in theory, should make redundancy easier to catch. The gap is in image-specific tooling and, crucially, in the absence of a single mandatory submission gateway. Individual arrondissement administrations, cultural institutions, and project teams like those working on the Rénovation urbaine du secteur Porte de Montreuil can currently upload image assets to multiple platforms without automated cross-checking.
What Comes Next for Researchers and the Public
The Direction des Affaires Culturelles has indicated — through published budget documentation for the 2026 municipal year — that €1.2 million has been allocated to extend the Charenton-le-Pont pilot into a city-wide deduplication layer by the end of the first quarter of 2027. If that timeline holds, Paris would have a functioning system before the next major wave of Grand Paris Express archival material arrives from the Ligne 15 Est construction corridor.
For now, researchers using the Bibliothèque historique de la Ville de Paris or the Paris Data portal should treat image search results with some scepticism: the same photograph of, say, a demolished building on Boulevard de Belleville may appear under three different identifiers with three different provenance notes. Cross-referencing with the physical catalogue at the reading room on Rue de Rivoli remains the most reliable verification method. The city's clean-up is real and funded, but the finish line is still eighteen months away.