Paris's municipal digital infrastructure is carrying tens of thousands of duplicate images across its public-facing archives, a problem that accumulated quietly over nearly a decade of aggressive scanning campaigns and has now reached a point where storage costs and retrieval failures are forcing a formal response. The city's Direction des Affaires Culturelles, which oversees heritage digitisation across arrondissements, confirmed earlier this year that a rationalisation programme is underway, though the scale of the redundancy problem has not been officially quantified in public documents reviewed by The Daily Paris.
The timing matters. Paris 2024 Olympics legacy commitments pushed the city to accelerate public access to cultural assets — photographs of Seine-Saint-Denis urban renewal, Grand Paris Express construction documentation, and heritage imagery from sites including the Bibliothèque nationale de France's Richelieu site and the Musée Carnavalet. Speed, not deduplication, was the priority. Multiple departments uploaded independently, cross-referencing was minimal, and the same photograph of, say, a Haussmann façade on Boulevard de Magenta could end up indexed under three separate project headings with different filenames.
The Pipeline That Created the Problem
The roots go back to around 2017, when the city began a structured push to digitise neighbourhood-level planning records and architectural photographs as part of the broader Grand Paris vision. Institutions including Paris Musées — the network that groups 14 municipal collections — and the Atelier Parisien d'Urbanisme each ran their own digitisation workflows. Both used external contractors. Neither was required, at the time, to submit files through a single deduplication gateway before ingestion into shared storage systems.
By 2021, the Bibliothèque historique de la Ville de Paris on Rue de Rivoli had processed more than 1.2 million image files since 2015, according to figures the institution published in its annual activity report. Cross-institutional estimates for total duplicated assets across the municipal ecosystem range widely in internal discussions, but archivists at several institutions have described the figure as running into the hundreds of thousands of files — enough to meaningfully inflate storage costs and degrade search result quality on public portals.
The problem is not unique to Paris. Madrid's city archive and Berlin's Landesarchiv have both grappled with analogous redundancy issues following accelerated pandemic-era digitisation. What distinguishes the Paris case is the layered governance: cultural institutions here operate with significant autonomy, and there is no single technical authority with the power to mandate a unified ingestion standard across all city-linked bodies.
What a Fix Actually Requires
Deduplication at this scale is not simply a matter of running a hash-matching script. Many of the redundant images are not pixel-identical — they are scans of the same original photograph made at different resolutions, or crops taken at different stages of processing. That means automated tools catch only a portion of the problem. Human review, cross-referenced against acquisition records, is required for the remainder. Archivists familiar with the process describe it as painstaking work that does not photograph well for a press release.
The city's current approach, as understood from publicly available tender documents posted to the BOAMP procurement platform in early 2026, involves a phased audit beginning with the Paris Musées digital asset system before expanding to planning and urban records held by the Atelier Parisien d'Urbanisme. A completion date for the first phase has not been published.
For researchers and journalists who rely on these systems — including those accessing the digitised collections through the Paris Data open portal at data.paris.fr — the practical advice for now is straightforward: treat any image retrieved from municipal archives as potentially one of several versions, verify file provenance through the acquiring institution directly, and request a high-resolution master file rather than relying on the web-accessible derivative. The underlying collections are rich. Getting to the right file, without duplication muddying the path, is the work that remains.