Abonnement gratuit
The Daily Paris

Paris news, every day

News

Paris Archives Are Drowning in Duplicate Digital Images — and the Numbers Tell a Costly Story

From the Bibliothèque nationale de France to the city's urban planning offices, redundant image files are eating storage budgets and slowing the digital projects Paris cannot afford to delay.

By Paris News Desk · Published 4 July 2026, 8:45 pm

3 min read

Paris Archives Are Drowning in Duplicate Digital Images — and the Numbers Tell a Costly Story
Photo: Photo by Daria Agafonova on Pexels
Traduction en cours…

Paris's public institutions collectively hold tens of millions of digitised images across their servers — and a growing share of that archive is redundant. The problem of duplicate digital images, long treated as a back-office headache, has ballooned into a measurable budget drain at exactly the moment the city is accelerating digital infrastructure work tied to the Paris 2024 Olympic legacy and the Grand Paris Express metro rollout.

The issue matters now because the city and the national government are both mid-way through ambitious digitisation drives. The Bibliothèque nationale de France, which operates its Gallica platform from its François-Mitterrand campus on the Quai François-Mauriac in the 13th arrondissement, catalogues over 15 million digitised documents. Industry-standard audits of large institutional image libraries typically find that between 20 and 30 percent of stored image files are either exact duplicates or near-identical variants — the product of batch scanning errors, multiple uploads from field teams, and version-control failures in project management software. Applied to the BnF's holdings alone, that range would imply millions of redundant files consuming storage capacity that costs real money.

The Storage Bill Nobody Wants to Talk About

Cloud and on-premises storage is not free. Enterprise-grade archival storage in France currently runs at roughly €0.02 to €0.04 per gigabyte per month for institutional contracts, according to publicly available pricing from French data-hosting providers such as OVHcloud, headquartered in Roubaix. A single high-resolution archival image scan can run to 50 megabytes or more. Multiply millions of duplicate files by that file size and the monthly carry cost moves quickly into five or six figures. For city-adjacent agencies operating under the Atelier Parisien d'Urbanisme — known as APUR, based on the Île de la Cité — duplicate imagery pulled from drone surveys, satellite passes, and field photography for Seine waterfront regeneration projects compounds the problem year over year.

The Grand Paris Express project, managed by Société du Grand Paris from its headquarters in Saint-Denis, has generated an extraordinary volume of photographic and scan data since groundbreaking began in earnest after 2016. Construction documentation, heritage impact surveys, and BIM-linked site photography for the 200-kilometre network's 68 new stations have all fed into document management systems where version discipline is inconsistently enforced. No public figure for duplicate-image volume across SGP's systems has been released, but the project's sheer documentation scale — covering work sites from Orly in the south to Le Bourget in the north — makes the risk structurally significant.

What the Data-Cleaning Push Actually Involves

Deduplication is not new technology. Perceptual hashing algorithms, which assign a fingerprint to each image and flag files that are identical or closely similar, have been commercially available since at least 2010. The French national digital preservation body, the Centre informatique national de l'enseignement supérieur, or Cines, based in Montpellier, has published guidelines on digital preservation that address redundancy management. The challenge for large Parisian institutions is less technical than organisational: deduplication requires centralised authority over file systems that are often managed by competing departmental teams with incompatible naming conventions.

Researchers at Paris institutions who rely on Gallica or on APUR's open data portal have long flagged slow query responses and inconsistent metadata as symptoms of bloated, poorly curated image databases. The city's open data initiative, launched under the Paris DataCity framework, committed in its 2023-2026 roadmap to reducing redundancy in municipal datasets — though image files were not explicitly called out in the roadmap's published targets.

The practical path forward involves three steps that archivists and IT managers at comparable European institutions — including the Rijksmuseum in Amsterdam and the Biblioteca Nacional in Madrid — have already completed: a full hash-based audit to quantify the duplicate share, a governance protocol that designates a single master file per image event, and a scheduled purge cycle tied to the institution's backup calendar. For Paris's agencies, the logical moment to act is before the next wave of Grand Paris Express documentation arrives with the network's scheduled partial opening on Line 15 South, currently targeted for late 2026. Waiting means the duplicate pile only grows — and so does the bill.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Paris

This article was produced by the The Daily Paris editorial desk and covers news in Paris. See our editorial standards for how we use AI.

The Daily Paris brief

The day's Paris news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Paris news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Paris and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Paris

More in News

Enjoyed this story? Get tomorrow's briefing free.