Digital archiving tools
aek at bitsavers.org
Fri Mar 11 21:28:26 CST 2011
On 3/11/11 5:45 PM, Rich Alderson wrote:
>> I guess the big philosophical issue I have with these schemes is do
>> you wrap something around a data container (try to make it self-
>> describing), or leave it as two separate containers ("read-only" and
>> "up datable").
> Where do you come down on that question?
Leaving the original container as-is, and having the metadata completely
separate. This lets you to know that the container should never change
so it can be written to a write-once medium, or copied, knowing that
the data will never intentionally change. The data that describes the
container, though, is known to change over time.
> My personal preference would be for something of a meta-filesystem,
> similar to the Mac OS X packaging for applications (a philosophical
> outgrowth of the resource/data forks idea simplified by using a file
> system to implement it).
Browsability or not.. A long-running argument at CHM for storing assets.
Bitsavers, and my working internal archive, are arranged hierarchically.
The big difference between the two is while bitsavers is almost exclusively
by source or vendor, the internal one is really a collection of
archives with differing structures depending on content.
Metadata is minimal (as Richard has noted in bitsavers).
The 'real' CHM archive started out as a hierarchy, but switched to directories
named after the month when the asset was added, since it was becoming increasingly
awkward and time consuming coming up with categories, and there was no way to automate
the repository ingest process (we had hundreds of pictures taken of objects created
while cataloging the collection with no manufacturer directories for them to go into
There was a decision that the collections database (Mimsy) was going to be the primary
way to locate assets in the archive. This is the logical way to go. Hierarchical databases
(which is really what you're creating with directory trees) are known to be less flexible
than relational ones.
The directory heirarchy is a stopgap to provide some structure for the containers
until more detailed metadata can be created. If a catalog record is created at the time
the container is added to the archive, and is a efficient way to browse in the database,
structuring the containers hierarchically isn't necessary.
More information about the cctech