Paris Archives Go Digital, But Duplicate Images Are Costing the City Millions
A surge in redundant digital files across municipal databases is quietly draining public budgets and slowing access to the capital's heritage records.
A surge in redundant digital files across municipal databases is quietly draining public budgets and slowing access to the capital's heritage records.

Paris's major digitisation drive has a hidden problem: duplicate images. Across the city's network of municipal archives, cultural institutions and urban planning databases, redundant digital files now account for an estimated 30 to 40 percent of total stored data volume — a proportion that experts in library science and digital asset management say is well above acceptable thresholds for institutions of comparable size. The financial and administrative cost is measurable, and it is growing.
The issue has sharpened in 2026 because the Grand Paris Express construction programme, the Seine regeneration project and the ongoing activation of Paris 2024 Olympic legacy sites have all generated enormous quantities of photographic documentation. Planning authorities, heritage surveyors and infrastructure contractors each photograph the same structures, the same riverbanks, the same venues — often without coordinating with one another. The result is databases flooded with near-identical images filed under different reference numbers, eating storage budgets and making genuine archival retrieval slower and more expensive.
Storage is not cheap at institutional scale. Enterprise-grade cold storage for large image archives runs at roughly €0.02 to €0.05 per gigabyte per month for public-sector contracts in France, according to published framework pricing under the UGAP national purchasing body. A municipal photographic archive holding several hundred terabytes — realistic for a city the size of Paris — can therefore carry an annual storage overhead running into six figures in euros before staff time is counted. If a third of that volume is duplicate or near-duplicate content, the wasted expenditure is direct and quantifiable.
The Archives de Paris, based on the Rue des Quatre-Fils in the 3rd arrondissement, manages millions of digitised records ranging from nineteenth-century cadastral maps to recent urban permits. The Bibliothèque historique de la Ville de Paris, on the Rue de Rivoli, holds parallel photographic collections covering much of the same physical territory. Both institutions have undergone significant digitisation funding rounds since 2020, partly under the Culture Ministry's Plan de numérisation du patrimoine. Duplication between the two collections, and between each of them and the departmental databases maintained by the Mairie de Paris planning directorate, is a documented challenge that archivists have flagged in institutional reviews, though no consolidated public audit figure has been published to date.
Automated deduplication tools — software that compares image hashes or uses perceptual similarity algorithms to flag near-identical files — are now standard in commercial digital asset management. Several French municipalities, including Lyon and Bordeaux, have piloted such tools within the last three years. Paris has tested similar approaches within the Apur urban planning agency, which covers the city's cartographic and photographic stock, but a city-wide rollout across all connected institutions has not yet been formally announced.
The practical stakes extend beyond storage bills. Under the French loi pour une République numérique and subsequent open-data obligations, municipal archives are required to make digitised heritage materials accessible through public portals. When the same image exists under seven different file identifiers, search results become cluttered, metadata is inconsistent, and researchers — whether historians at the Sorbonne or architects working on the ZAC Rive Gauche redevelopment in the 13th arrondissement — waste time filtering noise instead of finding source material.
Digital records specialists point to a practical benchmark: a well-managed institutional image archive should carry duplicate rates below five percent after active deduplication. Closing the gap between that standard and the current estimated 30-to-40-percent range across Paris's fragmented systems would not only cut storage costs but accelerate the city's compliance with its own open-data commitments, which the Mairie de Paris has reaffirmed under its 2025-2030 digital strategy.
For now, the most concrete near-term step would be a formal cross-institutional audit — something archivists and digital managers at the Apur and the Archives de Paris have the technical capacity to conduct jointly. Without that baseline count, the city cannot know precisely what it is paying to store, or what it stands to recover. The duplicate image problem is solvable. The first number Paris needs is the one that tells it exactly how large the problem is.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Paris
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News