Paris's public digital archives contain more than 4.2 million photographs — and by the city administration's own internal estimates, somewhere between 15 and 25 percent of those images are duplicates, near-duplicates, or functionally redundant files that have been stored, backed up, and licensed multiple times over. The problem did not appear overnight. It accumulated across two decades of uncoordinated digitisation drives, departmental siloes, and procurement cycles that prioritised volume over hygiene.
The reckoning has arrived partly because of money, and partly because of embarrassment. The Grand Paris Express construction programme, which has been photographically documented by at least four separate contractors since groundbreaking ceremonies began at stations including Noisy-Champs and Saint-Denis Pleyel, produced a documented case study in what happens when no single authority controls the image pipeline. By early 2025, the Société du Grand Paris had acknowledged internally that its visual asset library ran to hundreds of thousands of files with no deduplication protocol in place — a situation that became harder to ignore as the project approached its later delivery phases and communications teams struggled to locate authoritative imagery quickly.
The Accumulation: How a Two-Decade Drift Became a Crisis
The roots go back to the early 2000s, when the Bibliothèque nationale de France on the Quai François-Mauriac began its first large-scale digitisation programmes, and arrondissement-level mairies simultaneously started scanning local heritage collections with no shared metadata standard. Each institution used its own file-naming convention. Many used none at all. Images shot during the 2015 Paris Climate Agreement ceremonies at Le Bourget were later found in at least six separate municipal repositories under different filenames, different compression levels, and different rights attributions — a small example of a systemic failure.
The Paris 2024 Olympics accelerated the problem dramatically. The Paris 2024 organising committee, the Mairie de Paris communications directorate, the Préfecture de Paris, and multiple partner agencies all commissioned photography of the same venues — the Trocadéro esplanade, the Grand Palais, the Seine riverside installations — producing overlapping libraries that were never reconciled. Post-Games, when the city moved to activate its Olympics legacy programming through the Paris 2024 Héritage programme, archive staff found themselves inheriting a chaotic and legally tangled image estate.
Storage costs are not trivial. Commercial cloud storage rates for large public institutions in France typically run between €0.02 and €0.05 per gigabyte per month, and high-resolution photography files are large. A single uncompressed RAW file from a professional camera can exceed 80 megabytes. Multiply that across millions of duplicate files stored in redundant systems across data centres, and the annual overhead is measurable in hundreds of thousands of euros — public money, spent on storing the same image of the Seine at sunset filed under seventeen different names.
What Comes Next: Standards, Tools, and Institutional Will
The Direction des Affaires Culturelles de la Ville de Paris announced in March 2026 that it would adopt a unified digital asset management framework across all city-funded cultural institutions by the end of the year. The framework includes perceptual hashing — a technology that identifies visually identical or near-identical images regardless of filename or format — as well as mandatory IPTC metadata fields at point of ingest. The Centre Pompidou on the Place Georges-Pompidou and the Musée Carnavalet on the Rue de Sévigné are both listed as pilot institutions in the programme.
For individual Parisians or journalists working with public image databases, the practical advice is straightforward: treat any photograph sourced from a city or state archive before mid-2026 as potentially unverified for rights status, and cross-reference against the primary collection before publication. The cleaner, standardised repositories are not expected to be fully operational until at least Q1 2027.
Getting here took twenty years of institutional drift and one very large sporting event. Fixing it will take another eighteen months of unglamorous database work. That is, in most respects, exactly how these things go.