France's national and municipal archivists declared this week that the problem of duplicate images clogging public digital collections had reached a tipping point. The Archives de Paris, based on the Boulevard Sérurier in the 19th arrondissement, confirmed Thursday that an automated deduplication sweep of its online catalogue had flagged more than 14,000 redundant photograph records accumulated since the institution's digitisation drive began in earnest in 2019. The cleanup, still underway, is the largest single correction effort the institution has undertaken to its digital holdings.
The timing is not accidental. Paris 2024's Olympics legacy activation — the formal programme linking the Games' infrastructure investment to longer-term cultural and civic goals — has pushed several city-funded institutions to open their digitised archives more aggressively to the public. When more people search, the problems surface faster. Duplicate entries don't merely waste server space; they confuse researchers, skew usage statistics used to justify future digitisation budgets, and, in several documented cases at the Archives de Paris, led online users to request physical documents that were already held in a different format in another part of the same building.
What Triggered This Week's Action
The immediate catalyst was a joint audit commissioned in May by the Bibliothèque historique de la Ville de Paris, on the Rue de Sévigné in the Marais, and the Musée Carnavalet next door. The two institutions share a digitisation pipeline managed under the Paris Musées consortium, and the audit — results circulated to consortium members on 30 June — found that roughly one in eight image files ingested since 2021 appeared in the database at least twice, sometimes under different catalogue numbers and occasionally under contradictory metadata. Carte postale views of the Pont-Neuf from the early 1900s were among the most duplicated categories, appearing in some cases six or seven times across the shared system.
Paris Musées, which coordinates digital access for 14 municipal museums, has been running a pilot of open-source deduplication software developed with support from the French Ministry of Culture's Service du Livre et de la Lecture since January. The software uses perceptual hashing — a technique that identifies visually near-identical images regardless of file name or metadata — rather than simple filename comparison, which had failed to catch the problem in earlier cleanup attempts. As of Wednesday, the pilot had processed approximately 280,000 image files across three test collections and returned a duplication rate of around 11 percent, higher than administrators had publicly estimated before the audit.
What It Means for Access and the Institutions
For researchers and the general public using platforms like Paris en images or the Carnavalet's own online collection portal, the cleanup should eventually mean faster, cleaner search results. The Grand Paris Express construction project — now extending metro lines across the inner banlieue — has itself generated significant archival material documenting demolished and transformed neighbourhoods in communes like Saint-Denis and Vitry-sur-Seine, and those recent records have not been immune to duplication errors either.
The practical stakes go beyond tidiness. French cultural institutions receive part of their operating subsidy from the Ministry of Culture through formulas that partially account for the size and accessibility of their catalogued holdings. Inflated record counts caused by duplication have been a quiet concern among auditors at the Cour des comptes, France's public spending watchdog, for several years. An institution that resolves the problem proactively is in a stronger position when the next budget cycle opens, which for Paris Musées is expected in early autumn 2026.
Administrators at the Archives de Paris said this week they expect the current sweep to conclude by the end of July. Once complete, the corrected catalogue will be re-indexed and made publicly searchable. Institutions holding collections in the Paris Musées consortium are encouraged to submit their own image libraries for deduplication review before October, when the Ministry of Culture's next digitisation grant round opens. Researchers who have saved direct URLs to specific catalogue entries should check those links after the re-index, since some catalogue numbers are expected to change as duplicate records are merged.