Software-based floppy disc data separator
cisin at xenosoft.com
Sat Jun 19 13:53:39 CDT 2010
On Sat, 19 Jun 2010, Kieron Wilkinson wrote:
> It's certainly nice to talk to people who do do that though! I don't
> envy you, but I'm sure it's interesting and gratifying work (?)
It's a helluva hobby :-)
It's great when it will contribute towards paying the bills.
> > the bulk of floppies in existence, they don't represent the variety
> > of variations in filesystems, encodings or layouts. Most of the
> > details of those dedicated systems (say, from someone's PBX) remain
> > unknown to the current day. There are some people lurking who have
> > knowledge of certain file formats (e.g. embroidery machines) who can
> > assist in translating the data to something meaningful. But often,
> > the best you can get is "this is what the machine does when you feed
> > it this diskette".
. . . and the data and information from the users are not always reliable.
Such as when they create a single sided 35 track sample disk to analyze,
but record it on a used 40 track double sided disk, send you a disk to
analyze of a completely alien file system that happens to be jammed full
of remains of deleted files, or tell you that "the computer is a
Lear-Sigler with a Northstar Horizon external drive", or a "Pentabs"
computer (rebranded Vector-Graphic)
Just getting them to NAME the computer is a struggle.
> > If anyone wants to try their hand at it, I can send a time-domain
> > (i.e. Catweasel) sample of a Lanier 32-sector M2FM (as best I can
> > determine it) WP disk. You have only to figure out the character
> > set, file system, floppy encoding and file format...
> Nasty. Might as well be a cryptanalysist for that sort of thing! I
or just a puzzle solver
> wonder if some statistical-based analysis would help, but perhaps you
> are way ahead of me on that?
Chuck is very fond of histograms.
On the other hand, I hardly ever continued with any format that wasn't
going to be possible to convert using relatively stock hardware. "If it
isn't IBM/WD, then just file the disk in the appropriate section of
I played around with some probabalistic code to come up with what to try
first, particularly in finding and identifying software sector interleaves
(which sector is used after sector number 1? Feed the code the start and
end bytes of sectors and have it identify which ones are most likely to be
"half a worm" (start of a "word" at the end of one sector, end of the word
at the beginning of another sector); in the absence of adequate langauage
text (or excessively unfamiliar languages) multibyte machine language
instructions are adequate)
"Ah HA! The code thinks that there is a high probability that sector 7
follows sector 4. 5 instances of probable sequence on the disk, 3 of
which are words!, and 1 of the non-words started with an upper case
character. Therefore, 1,4,7? But is it 1,4,7,2,5,8,3,6,9 or
1,4,7,3,6,9,2,5,8?" YES, both of those sequences exist.
'e' is the most probable character in English language text. But if a
sector ends with a 'q', then 'u' or '.' are the most probable subsequent
characters. BTW, 'e' is NOT the most probable character following a space
- look at the thicknesses of different starting letters in the dictionary!
An upper case character is much more likely to follow a space that follows
another space or a period., etc.
But for file information, nothing beats the human mind.
Grumpy Ol' Fred cisin at xenosoft.com
More information about the cctalk