Digital archiving tools

William Maddox wmaddox at
Tue Mar 8 19:33:38 CST 2011

Many of us maintain large collections of bits that we'd like to preserve over a long time, and distribute, replicate, and migrate via unreliable storage media and networks.  As disk sizes (and archive sizes) have increased, the probability of corruption undetected or uncorrected by the mechanisms normally built into disk drives, network protocols, and filesystems has increased to a level that warrants great concern.

I would be interested to know if there exists an archive format that has the following desirable properties:

1) It is well-documented, and relatively simple, to facilitate its implementation on many platforms present and future.

2) It supports some degree of incremental updating, but need not be particularly efficient about it.  An explicit compaction operation is preferable to an overly complex format.  It is adequate to use append-only strategies appropriate for write-once media.

3) Insertion and extraction of files, copying of the archives, and other archive-manipulation utilities support end-to-end verification that identical bits have been stably recorded to the media, bypassing or defeating platform-level or hardware-level caching mechanisms.  Where this is not possible, the limits must be carefully delineated, with some basis for determining the properties of the platform and certifying reliability
properties where possible.

4) The format should provide for superior error detection capability, designed to avoid common failure modes with mechanisms typically used in hardware.  For example, use a document-level cryptographic checksum rather than a block-level CRC.

5) The format should include a high degree of internal redundancy and recoverability, say, along the lines of a virtual RAID-array.

Just as biological organisms constantly correct DNA transcription errors,
the idea is to have a format that is robust across long-term exposure to
imperfect copying and transmission channels.

Does anything like this exist?


More information about the cctech mailing list