Paris city administrators confirmed this spring that duplicate and AI-generated images now account for an estimated one in five submissions to the Bibliothèque nationale de France's digital acquisition pipeline — a proportion that has roughly doubled since 2023. The figure, which emerged from an internal review by the BnF's digital collections directorate, has pushed the French capital to the front of a global reckoning over how institutions catalogue, verify and purge redundant visual content from public archives.
The timing is not accidental. The Paris 2024 Olympics generated an unprecedented volume of photographic and digital media — hundreds of thousands of images logged by Ville de Paris agencies, sports federations and cultural bodies in a compressed eighteen-month window. That deluge exposed weaknesses in legacy content management systems that were never designed to handle AI-assisted image generation at scale. Officials at the Direction des Affaires Culturelles, the city's cultural arm, began auditing duplicate image rates across municipal databases in January 2025, and the results prompted an accelerated procurement process for dedicated deduplication software.
What Paris Is Actually Doing
The centrepiece of the city's response is a pilot programme running across two institutions: the Médiathèque de la Canopée in the 1st arrondissement and the Bibliothèque Marguerite Yourcenar in the 15th. Both sites are testing a hash-based visual fingerprinting system supplied by a French tech firm under a contract that runs through December 2026. The software flags near-duplicate images — not just exact copies — by comparing perceptual hashes, catching the kind of subtly altered AI variants that fool conventional deduplication tools.
The Médiathèque de la Canopée, which sits inside the Forum des Halles commercial complex and serves one of the city's highest-footfall neighbourhoods, processes roughly 4,000 digitised document submissions per month. Staff there say the pilot has already flagged several hundred duplicate image clusters in the first quarter of operation, though the institution has not published a formal breakdown. The Bibliothèque Yourcenar, a smaller branch with a strong community-digitisation programme in the 15th, is running a parallel track focused specifically on locally submitted heritage photographs, a category particularly vulnerable to duplication as residents scan and resubmit old family images.
How Paris Compares Globally
London's situation offers a useful contrast. The British Library has acknowledged the duplicate image problem publicly but has not launched a dedicated municipal-level pilot comparable to Paris's. Its broader digital preservation strategy, updated in 2025, references the issue under general data integrity headings without committing to specific deduplication targets or timelines. The Berlin Senate's cultural administration is further behind still, with the Zentral- und Landesbibliothek Berlin still relying on manual spot-checks for digital submissions from community partners.
Tokyo presents a more competitive picture. The Tokyo Metropolitan Library completed a system-wide deduplication audit in March 2026 covering its digital image holdings across 23 ward libraries, and announced that roughly 12 percent of its visual archive contained redundant entries — a lower rate than Paris's BnF figure but still significant enough to prompt a city-wide remediation budget of ¥340 million (approximately €2.1 million) for the current fiscal year. Tokyo's approach leans heavily on vendor-supplied cloud processing rather than the locally-hosted model Paris has chosen, a distinction that raises different data sovereignty questions under French and EU law.
New York's public library system, which operates across the Bronx, Manhattan and Staten Island branches under the New York Public Library umbrella, has pursued deduplication largely through its existing digital asset management contracts rather than a standalone programme. The NYPL's digitised collections team flagged the duplicate image issue in a 2025 annual report but noted that resource constraints had delayed comprehensive remediation.
For Paris residents and researchers who use municipal archives, the practical upshot is straightforward: search results across platforms linked to the Archives de Paris and Gallica, the BnF's online library, should become measurably cleaner by early 2027 if the current pilots are rolled out system-wide. The Direction des Affaires Culturelles has indicated a full procurement decision is expected before the end of the third quarter of 2026. Whether the budget allocated — reported internally at around €800,000 for the first phase — proves sufficient will depend heavily on how quickly AI image generation continues to scale. The pressure is real, and Paris, for now, has at least moved faster than most.