Paris has moved further than almost any comparable European city in tackling the problem of duplicate images clogging its public digital archives, but a cleaner record is proving harder to maintain than officials anticipated when the effort began in earnest in 2023. Three years in, the city's cultural institutions are still finding redundant files at a rate that surprises even the teams assigned to remove them.
The issue matters more now than it might seem. With the Grand Paris Express metro expansion generating thousands of construction-phase photographic records each month, and the Paris 2024 Olympics legacy programme continuing to deposit images into municipal and national repositories, the volume of raw visual data held across Parisian institutions has grown sharply. Storage costs money — the Bibliothèque nationale de France estimated in its 2024 annual report that digital asset management consumed a growing share of its infrastructure budget — and duplicate files distort search results, complicate rights clearance, and slow public access to genuine historical material.
At the Médiathèque de la Ville de Paris, archivists working within the Direction des Affaires Culturelles launched a deduplication audit covering holdings linked to the Paris 2024 programmes. Staff there have used perceptual hashing tools — software that identifies near-identical images even when file names or metadata differ — to process collections that previously required manual review. The Atelier Parisien d'Urbanisme, the city's urban planning agency known as APUR, runs a parallel effort for its cartographic and photographic records of the Seine riverbanks regeneration corridor, where drone survey images in particular tend to generate large clusters of near-identical frames.
How Paris Compares With London, Amsterdam and New York
London's situation offers a useful contrast. The London Metropolitan Archives, which holds records spanning 900 years, launched its own digital deduplication project in 2022, initially focused on post-war planning photography held across borough councils. Progress has been slower than in Paris partly because responsibilities are fragmented across 33 boroughs rather than consolidated under a single municipal authority. Amsterdam's Stadsarchief — the city archive — is widely regarded as the benchmark for deduplication practice in Europe, having invested in automated workflows since 2019 and achieved what its published documentation describes as an 80 percent reduction in duplicate entries within its photographic collections by 2024. Paris is tracking toward comparable results but started later and from a larger base.
New York City's approach through the Municipal Archives differs structurally. American public records law means holdings are often partially duplicated across federal, state and city systems as a deliberate redundancy measure, so aggressive deduplication carries legal risk that European archivists do not face in the same way. Paris has no such constraint under French archival law, giving the Direction des Affaires Culturelles more latitude to delete confirmed duplicates rather than simply flagging them.
The financial stakes are not trivial. Cloud storage pricing for large cultural institutions in France has risen alongside general infrastructure costs since 2022, and holding hundreds of thousands of redundant image files — some estimates within the profession suggest major European city archives carry duplication rates of between 15 and 30 percent across uncurated digital collections — represents a tangible, recurring expense. The BnF's Gallica platform alone hosts millions of digitised items, and even a modest duplication rate across that scale compounds quickly.
What Comes Next for Parisian Archives
The practical question facing institutions along the Rue de Richelieu corridor — where the BnF's historic Richelieu site sits alongside several specialist collections — is whether deduplication can become genuinely continuous rather than a periodic audit exercise. The tools exist. The constraint is staffing: archival metadata work requires human judgment for edge cases that automated tools flag but cannot resolve, particularly for images where the duplicate determination depends on contextual records held in separate systems.
For members of the public who use Paris's digital collections — through platforms such as Paris en Images or the BnF's Gallica — the immediate benefit is a cleaner, faster search experience as the work progresses. Researchers working on Seine regeneration history or Olympics documentation should expect improved discovery from late 2026 onward as the current audit cycles conclude. Amsterdam's experience suggests the gains are real and durable, provided institutions commit to embedding the process rather than treating it as a one-time fix.