The Bibliothèque nationale de France flagged the problem formally in late 2024: somewhere between 18 and 22 percent of the photographic assets held across its Gallica digital platform were functional duplicates — same image, different metadata tags, different file names, clogging search results and distorting archival research. By early 2025, the BnF had launched an internal working group to address it. Now, more than a year on, Paris has quietly become one of the more methodical cities in the world at confronting what archivists call the duplicate image replacement problem — the messy, expensive business of identifying redundant visual files in civic databases and replacing or retiring them without destroying legitimate historical variants.
The timing matters. Paris 2024 Olympics legacy commitments required the City of Paris to make large volumes of event photography available through open-data portals, including the Paris Open Data platform run out of the Hôtel de Ville. That rapid ingestion of imagery — sports, infrastructure, crowd scenes — accelerated a pre-existing problem. Images shot by multiple accredited photographers at the same event, from near-identical angles, were uploaded with different attribution metadata, making automated deduplication unreliable. The result was a searchable archive with significant visual clutter, creating real headaches for urban planners, journalists, and researchers trying to build accurate records of how public space in districts like Saint-Denis and the Plaine de France looked before and after the Games.
What Paris Is Actually Doing
The city is not working alone. The Atelier Parisien d'Urbanisme, known as APUR, has been coordinating with the BnF and the city's Direction des Affaires Culturelles since the spring of 2025 on a joint protocol for duplicate detection across civic visual assets. The protocol uses a combination of perceptual hashing — a technique that compares image content rather than file names — and manual curatorial review for assets flagged as historically significant. APUR, whose offices sit near the Place du Châtelet, handles the urban planning imagery; the DAC manages the cultural heritage photographs.
Neither London nor Berlin has a comparable city-wide protocol in place. London's equivalent body, the Greater London Authority, digitises significant volumes of planning imagery but relies on individual borough archives to manage deduplication independently — a decentralised model that archivists at institutions including the London Metropolitan Archives have described in published conference papers as inconsistent. Berlin's Landesarchiv operates a more centralised system but has not publicly committed resources specifically to duplicate image replacement as a defined programme. Amsterdam's Stadsarchief has gone furthest among comparable European municipal archives, launching a perceptual hashing pilot in 2023 that, according to the archive's own published annual report for that year, reduced duplicate records in its visual collections by roughly 14 percent within twelve months.
The Cost and the Competition
Cost is the central tension. Curatorial review of flagged duplicates in Paris is estimated internally — based on figures cited in APUR's publicly available 2025 work programme — at between €40 and €65 per image for complex heritage cases requiring specialist sign-off. At scale, across the tens of thousands of Olympics-era images alone, that arithmetic becomes a significant budget line. The BnF's Gallica platform, which serves researchers across France and receives roughly 15 million document consultations per year according to the library's own published statistics, has justified the expenditure partly on search-quality grounds: cleaner archives return more precise results, reducing user frustration and server load.
Seoul offers a different model worth watching. The Seoul Metropolitan Government's digital archive team published a methodology paper in 2024 describing a near-fully automated deduplication pipeline that requires human review only for assets older than 1970. The automation rate reportedly exceeded 90 percent of flagged files. Paris has not adopted automation at that level, partly because the DAC argues that the city's pre-digital photographic heritage — much of it documenting neighbourhoods like Belleville and the Marais during mid-century urban renewal — carries too much contextual risk for machine-only decisions.
For institutions and researchers who rely on these archives, the practical upshot is straightforward: check update logs on the Paris Open Data portal and Gallica regularly through the end of 2026, as both platforms are expected to roll out revised image catalogues on a quarterly basis as the deduplication work progresses. APUR has indicated it will publish a public-facing methodology note before the end of the year, which should allow independent researchers to understand which images were retired and why — a transparency step that neither London nor Berlin has committed to matching.