Paris's leading cultural institutions are scrambling to address a growing duplicate image problem in their online collections, after internal reviews this week confirmed that several major databases have been compromised by repeated and mislabelled files introduced during accelerated digitisation drives. The issue, which spans collections held by institutions along the Right and Left Banks, has quietly undermined years of public investment in making France's cultural patrimony accessible online.
The timing is awkward. France committed to a major expansion of digitised public collections as part of the legacy agenda tied to the Paris 2024 Olympics, when officials promoted the idea of cultural access as a democratic dividend from the Games. Two years on, the rushed ingestion of hundreds of thousands of image files into public portals has left metadata in disarray, with duplicates cluttering search results and, in some cases, wrong images attached to catalogue records entirely.
What Went Wrong — and Where
The Bibliothèque nationale de France, whose Gallica platform hosts more than nine million digitised documents and images, confirmed this week that an internal audit launched in April had identified tens of thousands of duplicate image files introduced between 2023 and early 2026. The problem is concentrated in photographic and postcard collections, where batch-scanning contracts with external firms produced redundant files that were ingested without adequate deduplication checks. The BnF's main site on the Rue de Richelieu and its François-Mitterrand campus in the 13th arrondissement both feed into Gallica, and synchronisation errors between the two digitisation pipelines appear to have compounded the problem.
The Musée Carnavalet, the city's dedicated museum of Parisian history on the Rue de Sévigné in the Marais, faces a related but distinct challenge. After a major renovation reopened the museum in 2021, its digital catalogue was rebuilt from scratch. Staff this week acknowledged — without providing specific figures — that duplicate images from the old and new systems had merged in ways that are proving labour-intensive to untangle. Visitors using the online collection to research 19th-century street photography of Haussmann-era Paris have reported finding the same image listed under multiple accession numbers.
The Paris City Archives on the Rue des Quatre-Fils, just around the corner from the Carnavalet, is understood to be running a parallel clean-up exercise. The archives digitised more than 400,000 items between 2022 and 2025, partly funded through the Grand Paris Express infrastructure programme's ancillary cultural commitments.
The Cost of Getting It Wrong
Duplicate image errors are not merely an administrative inconvenience. Researchers, journalists and educators relying on these databases for accurate attribution can inadvertently reproduce mislabelled images in published work, creating downstream errors that are difficult to correct. In one documented case circulating among digital archivists this week, a photograph of the Pont de l'Alma taken in the 1890s had been catalogued under three separate dates spanning fifteen years — a consequence of the same scan being ingested through different batch processes.
The broader financial context matters here. France's Ministère de la Culture allocated roughly €30 million to digitisation programmes between 2022 and 2025, a figure drawn from public budget documents. Institutions that accepted those funds are now under pressure to demonstrate that the money produced reliable, searchable, deduplicated outputs — not inflated file counts padded by duplicates.
Several institutions are now piloting AI-assisted perceptual hashing tools, which compare images mathematically rather than relying on metadata alone, to identify duplicates that slipped past earlier checks. The BnF is expected to publish preliminary findings from its audit before the end of July, and a working group convened by the Direction générale des médias et des industries culturelles is due to meet again on 15 July to coordinate standards across institutions.
For researchers and members of the public using these platforms in the meantime, archivists advise cross-referencing any image found on Gallica or the Carnavalet's online portal with the physical catalogue number before citing it. When in doubt, a direct inquiry to the relevant reading room — both the BnF's Salle des manuscrits and the Carnavalet's documentation centre remain open to accredited researchers — is the most reliable route to a verified original.