archiving data, was RE: Media Longitevity/Care (Long perhaps OT)
Jerome H. Fine
jhfinexgs2 at compsys.to
Sat Mar 19 19:11:01 CST 2005
>McFadden, Mike wrote:
>I think one thing everyone forgets is there are appropriate high and low
>tech options for each archiving task.
Jerome Fine replies:
I think you addressed the long term problem, but
did not discuss the solution. Some have mentioned it,
perhaps the solution needs to be emphasized AGAIN!
>From what I understand, if any files need to be
kept and accessed for more than 2 to 3 years,
the only way to be sure is to actually copy the
files AGAIN as often as is found necessary with
any given media at that stage in the cycle. A
cycle can be anywhere from 1 year to 20 years
depending on how long the current media last and
how long drives are available to read the current
media. As soon as either the current media start
to degrade or the drives are no longer manufactured,
then it is time to shift to new media (of the same
type) or new drives using the different media.
Over the past 20 years, I have noticed that I have
needed to shift from tapes to magneto optical media
and now to DVDs. I expect that within 10 years I
may again need to shift. The key is to always keep
a bit ahead of the problem rather than waiting until
the problems such as NASA has with old tapes that
can no longer be read with tape drives that rarely
work in any case.
Since media seems to change more quickly these days
than previously, it just may be necessary to adopt a
2 to 3 year cycle and make additional backups
to ensure that the files are not ever lost for
as long as the files are considered useful.
Fortunately, since storage capacity is becoming
larger and larger, the problems for a given set
of files actually become smaller. For example,
I still use a Pentium III (started in 2002).
Initially, I settled for a CD burner, but that
was already too small from the start for my system
for easy archives since my backup image files were
about 1 GByte. Even after 3 years as I have been
able to eliminate redundant files, the backup image
is still over 800 MBytes. Fortunately, about 2 years
ago, the DVD burners (I use a P05) fell to within
almost reasonable cost (for a hobby budget) and I
started to share a drive every 4 months at which
point I copied 4 monthly backup images to the DVD-R.
Repeating every 4 months means I use only 3 DVD-R
blanks a year which is a very reasonable yearly cost.
I just mention my personal experience as an example.
Other users will have their own requirements. By
the way, I originally started on a PDP-11 with
RX03 being my primary backup (not mentioned above).
After about 5 years I shifted to the TK25 (DC600A
tapes of 50 MBytes) followed by the TK70 (tapes of
290 MBytes). Since all backup files were 32 MBytes
of 32 MByte RT-11 partitions (without compression),
the TK70 was a big improvement. After I switched
to a magneto optical backup (Sony SMO S-501), the
big advantage was that I could update the SMO media
with just the few changes and then verify the complete
media against the 8 RT-11 partitions per each side
of the SMO media. When I shifted to using E11 under
Windows 98 SE, the same SMO media could be used as
backup on Pentium systems. When the DVD burner finally
became economical, in addition to the files under the
FAT32 directory for Windows 98 SE, I also made a
backup of the PDP-11 files which currently occupy
only a single DVD.
One additional suggestion is that before I burn the
backup images to the DVD, I also make up a file with
the 4 MD5 values for the 4 image files. This ensures
that when I verify that the files can still be read that
the files are also correct - assuming that the file
with the MD5 values can also be read. I ensure that
by keeping all MD5 files on line and writing all of
them to all future backup DVDs.
What I suspect I will end up doing is after 4 years
is to make up a DVD with 4 backup image files from
December 31st of each year which will allow me to
downgrade the importance of the monthly backup files.
AGAIN, the key point is that an archived file can't be
assumed to be readable after more than 2 or 3 years.
It must be verified in some manner - as in my example
by checking that the MD5 value is still correct. And
after 4 or 5 years, it is also probably prudent to
make a more recent copy such as I have suggested.
And finally, after 10 years, it is probably essential
to consider switching to the current media and drives
if they are no longer being manufactured.
Naturally, all the other aspects of making multiple
copies and not filling the complete DVD also apply.
Probably the only new items I have really mentioned
is the use of the MD5 value and to emphasize that
archived files must be periodically verified to be
sure that they are still readable BEFORE they degrade.
If you attempted to send a reply and the original e-mail
address has been discontinued due a high volume of junk
e-mail, then the semi-permanent e-mail address can be
obtained by replacing the four characters preceding the
'at' with the four digits of the current year.
More information about the cctech