Paris Archives and City Databases Race to Purge Duplicate Images This Week
A coordinated push across municipal digital platforms is exposing how years of redundant visual data have quietly inflated storage costs and muddied public records.
A coordinated push across municipal digital platforms is exposing how years of redundant visual data have quietly inflated storage costs and muddied public records.

City archivists and digital records managers at the Bibliothèque historique de la Ville de Paris confirmed this week that a systematic duplicate-image replacement operation is under way across several major municipal databases, with a hard deadline set for 31 July 2026. The effort targets tens of thousands of redundant image files that have accumulated since the city's mass digitisation push that accelerated ahead of the Paris 2024 Olympics.
The timing matters. Paris committed in 2023 to making its full photographic archive of public spaces — from the Seine riverbanks to the Grand Paris Express construction corridor — available via open-data portals by the end of this year. Duplicate entries have blocked that rollout. City technicians say certain image sets appear three or four times under different file names, a legacy of siloed digitisation efforts run by separate departments that never reconciled their datasets.
The bulk of the duplication is traced to two institutions. The Atelier parisien d'urbanisme, known as APUR, ran its own photographic survey of Seine-Saint-Denis and the inner banlieues between 2021 and 2023 as part of Grand Paris Express planning documentation. Separately, the city's Direction de l'Urbanisme commissioned street-level imagery of arrondissements one through eight. When both datasets were ingested into the central Paris Data platform on the Rue du Louvre, automated quality checks flagged more than 40,000 image-pair duplicates — files that are visually identical but stored with different metadata tags and creation timestamps.
The RATP's infrastructure photography archive, used internally for maintenance documentation along the Ligne 14 extension to Orly, surfaced a smaller but still significant redundancy cluster. Engineers working on the Villejuif-Gustave Roussy station section found that inspection images from 2022 site surveys had been uploaded in two separate batches, creating a records ambiguity that complicates compliance reporting under French public infrastructure law.
Duplicate visual data is not simply a storage nuisance. Under rules set out in France's loi pour une République numérique, public-body datasets made available for reuse must be accurate and non-duplicative. Researchers and urban planners who query Paris Data can receive distorted outputs — overrepresenting certain street segments or building façades — when the same image registers twice as distinct evidence. The Paris Housing Observatory, which uses city imagery to track illegal short-term rental conversions around Montmartre and the Canal Saint-Martin, flagged the problem to city officials in April 2026 after its analysis software returned inflated counts for several postcodes.
The current operation uses perceptual hashing — a technique that generates a compact digital fingerprint of each image based on its visual content rather than its file name — to identify near-identical pairs. Where a duplicate is confirmed, the older or lower-resolution version is flagged for removal and a canonical master file is retained with consolidated metadata. City technicians told colleagues at a working session at the Hôtel de Ville on 1 July that roughly 18,000 duplicates have been resolved since the operation began on 23 June, leaving an estimated 22,000 to process before the month-end deadline.
Storage costs are a factor that has sharpened political attention. Paris's municipal cloud contract, managed through the Direction des Systèmes et Technologies de l'Information, runs to several million euros annually. Redundant image files across departmental servers have been identified internally as consuming an estimated 12 percent of active storage allocation — capacity that could otherwise support the city's expanding real-time sensor network monitoring Seine water quality and air pollution along the Boulevard Périphérique.
For members of the public who use Paris Data to download neighbourhood imagery for academic or commercial projects, the practical advice from the platform's support team is to wait until after 1 August before pulling large image datasets. Files downloaded before that date may include duplicates that will subsequently be removed, leaving gaps in any locally cached collections. A versioning notice will be posted to the Paris Data portal confirming when the cleaned dataset goes live. Researchers working with the Paris Urban Ecology agency on Seine bank regeneration mapping have already been asked to re-export their image selections after the deadline passes.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Paris
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News