Paris's municipal archive system holds more than 14 million photographs, maps and digitised documents — and a significant share of them appear more than once. The Bibliothèque historique de la Ville de Paris, headquartered on the Rue des Francs-Bourgeois in the Marais, acknowledged the scale of the redundancy problem earlier this year when it launched a formal deduplication audit covering collections ingested since 2018. The audit is the first of its kind at this scale among French municipal institutions.
The timing is not accidental. The push to clean up these archives comes directly out of the Paris 2024 Olympics documentation effort. Hundreds of public agencies, cultural bodies and transport operators — including Île-de-France Mobilités and the Mairie de Paris communication department — contributed image sets during and after the Games, flooding central repositories with overlapping material. An internal working group estimated the redundancy rate across certain post-Games collections ran as high as 30 percent, making consistent public search results unreliable and inflating cloud storage costs. The Grand Paris Express project alone has produced thousands of site photographs across its 68 planned stations, many captured by multiple contractors working the same construction phases.
What Deduplication Actually Means on the Ground
The practical problem is more disruptive than it sounds. When a researcher at the Hôtel de Ville requests archival imagery of, say, the Porte de la Chapelle neighbourhood for a planning report, duplicate records surface inconsistently tagged metadata — different dates, different photographer credits, different rights statuses — attached to what is essentially the same image file. That creates legal exposure around reuse rights and wastes researcher time. The Paris Urban Planning Agency, known as l'Institut Paris Région, flagged the issue in a methodological note circulated to member communes in the spring of 2026.
Paris is addressing this with a combination of perceptual hashing software and manual curatorial review — a hybrid model. Perceptual hashing generates a fingerprint for each image based on pixel patterns rather than file metadata, allowing near-duplicate images taken seconds apart, or scanned at different resolutions, to be flagged automatically. The Médiathèque de l'Architecture et du Patrimoine, based in Charenton-le-Pont just southeast of Paris, piloted the approach on its historical construction photography in 2025 and reported reducing its active duplicate count by roughly 22 percent over a six-month period, according to its published 2025 activity report.
How Paris Compares to London, Berlin and New York
Other major cities are further behind. The London Metropolitan Archives, holding records for Greater London, has publicly described its digital deduplication work as still in early consultation phase as of late 2025. Berlin's Landesarchiv, managing records across the reunified city's fragmented pre-1990 institutional landscape, faces a structurally more complicated task: parallel East and West German photographic collections with overlapping subject matter but distinct provenance chains. New York City's Department of Records and Information Services has prioritised digitisation volume over deduplication quality, a sequencing choice that has left its online portal, NYC Municipal Archives, with known duplicate clusters across its 1930s Federal Art Project holdings.
Paris's advantage is partly institutional. The city's Direction des Affaires Culturelles has operated a unified digital asset management framework since 2021, giving it a single point of governance that cities like London — where archival responsibility is distributed across 33 boroughs — struggle to replicate. That centralisation also makes cross-collection deduplication technically simpler, if not simple.
The cost of getting this right is real. Cloud storage pricing for large uncompressed image files is not trivial, and institutions running redundant inventories pay for it twice — in storage and in staff hours. For archives and urban planning departments budgeting under municipal austerity pressures, the efficiency argument for deduplication is becoming harder to dismiss.
For researchers, journalists and urban planners working with Paris's public image collections, the practical advice is to use the Portail des bibliothèques de la Ville de Paris search interface rather than individual collection databases, at least until the Rue des Francs-Bourgeois audit concludes, expected by the end of the third quarter of 2026. The audit's findings are due to be presented to the city's cultural commission in October, and its methodology may subsequently be offered to other Grand Paris municipalities as a transferable framework.