Paris's municipal digital archive now holds more than 4.2 million photographs, a figure that ballooned sharply after the Paris 2024 Olympics documentation effort added an estimated 800,000 new image files to the city's servers at the Hôtel de Ville. The problem: a significant share of those files are duplicates, near-duplicates, or low-resolution copies sitting alongside higher-quality originals — wasting storage, distorting search results, and making the public record harder to navigate. City officials at the Direction de la Communication have acknowledged the backlog without specifying a remediation deadline.
The issue matters now because Paris is in the middle of a major push to digitise its urban heritage, tied directly to the Seine regeneration programme and the Grand Paris Express expansion. Both projects are generating tens of thousands of documentary images each year for planning records, public consultations, and press use. When the same crane photograph at Gare du Nord appears seventeen times under different filenames, it slows archivists, skews automated cataloguing systems, and — in one publicly documented case flagged by the Bibliothèque nationale de France in late 2025 — caused a wrongly labelled image of the Porte de la Chapelle Arena to circulate in official planning documents.
What Paris Is Doing — and Where the Gaps Are
The city's primary response has been a pilot contract awarded in early 2026 to a consortium working with the Établissement public Paris La Défense and the Paris municipal archives on the Quai Henri-IV. The pilot uses perceptual hashing, a technique that generates a compact fingerprint for each image and flags matches above a defined similarity threshold, to identify and quarantine duplicates before they enter the permanent archive. The technology is not new — it has been standard in newsroom content management for several years — but its application to a municipal archive of this scale in France is being treated as a test case by the French Association of Municipal Archivists.
Amsterdam's Stadsarchief began a comparable deduplication programme in 2023 and by March 2025 had reduced its active image catalogue by roughly 31 percent without deleting a single unique record, according to a published case study from the International Council on Archives. Seoul's Metropolitan Government completed a similar exercise across its Han River Development Authority documentation in 2024, cutting redundant storage costs by the equivalent of approximately €180,000 annually. Paris, by contrast, has not yet published measurable targets or a public cost estimate for its programme.
London's situation is instructive. The Greater London Authority holds photographic records across at least fourteen separate departmental servers, and a 2024 audit commissioned by the London Legacy Development Corporation — the body overseeing the Queen Elizabeth Olympic Park — found duplicate or near-duplicate images accounted for around 22 percent of stored files. London has since mandated a unified metadata standard for all new image uploads, a step Paris's archivists have discussed but not yet formalised.
The Practical Stakes for Paris Residents and Planners
For ordinary Parisians, the clearest consequence is in public-facing platforms. The Apur — Atelier Parisien d'Urbanisme — runs the city's open data portal, where residents and researchers search for images related to neighbourhood planning. Duplicate entries inflate result counts, pushing down genuinely distinct images and making searches for, say, housing density studies near the Boulevard Périphérique more cumbersome than they should be. Apur has said it is working to align its image standards with the city archive pilot, though no formal integration date has been set.
The broader lesson from Amsterdam and Seoul is that deduplication only holds if new images are tagged correctly at the point of ingest — not cleaned up retroactively years later. Archivists working on the Grand Paris Express documentation, which will eventually cover 68 new stations across Île-de-France, have pushed for mandatory metadata protocols before the first images from new stations at Saint-Denis Pleyel and Le Bourget are officially filed. Whether the city adopts those standards before the next documentation surge — expected when the Line 15 South section opens — will determine how manageable the archive remains.
For researchers and journalists requesting images through the city's open-data portal today, the practical advice is straightforward: filter by file size and upload date before downloading, and cross-check with the BnF's Gallica platform, which runs its own deduplication layer and tends to surface cleaner, verified copies of civic imagery.