Reading an MFM hard drive with a floppy disc analyser

Philip Pemberton classiccmp at
Fri Dec 5 17:12:00 CST 2008

Jules Richardson wrote:
> Well, extra lines isn't a big deal I suppose - heck, you could even do 
> that bit 'manually' and read platters one at a time, flipping a switch 
> between each pass.

True, though the hardware I'm actually thinking of using is my USB 
hack-and-bodge floppy reader. FPGA, PICmicro, and a few buffers/level translators.

The discrete FDD I/O stage is gone though -- mainly because in an overvoltage 
condition, it's highly likely the 74LS14 input buffer would get deep-fried... 
Thus rendering the "slightly more rugged" output drivers utterly pointless.

I've kept to the original spec for the connector though -- a 40-pin IDC with 
the first 34 pins forming a Shugart-compatible interface, and the last 6 
defined as user I/O (each pin configurable as either input or output, all 

> Well, run some figures as a sanity-check... say 256*8 bits per sector, 
> 32 sectors/track on a formatted ST506 - 32 x 256 x 8 = 64Kb of actual 
> data per track.
> 83.35 seems sensible as some kind of theoretical maximum, given storage 
> of sector headers and the like.

And MFM is basically one flux transition per (data) bit guaranteed, meaning 
83.35kbps isn't actually that strange a figure.

> I'm not sure what sort of granularity is needed, though - presumably the 
> CW uses counters geared toward expected floppy rates, so most likely 
> can't represent timing gaps accurately enough for hard disk speeds?

The read clock on a CW is about 8MHz or so, if memory serves. Effectively 
that's ~16 clocks per bit cell time (at 500kbps MFM that is). The hardware I'm 
building up is going to sample at 20 or 40MHz (software selectable), which is 
40 (or 80) clocks per bitcell on 500k MFM, or 4 (or 8) on 5Mbps MFM...

Adding another sampling rate to the hardware probably wouldn't be that hard -- 
add another input to the clock-mux and either add a PLL multiplier to the 
logic (there are four on an Altera Cyclone II 2C20) or another TTL oscillator 
to the PCB.

>> Do my calculations look right?
>> With 304 cylinders and 4 heads, that works out to 304 * 4 * 83.35 = 
>> 101353.6 Kbits, which does seem awfully low to me...
> 101353.6 Kb is approx 98Mb, or approx 12MB - so that seems not 
> unreasonable.
> However, remember that's *sample count*, not total storage.

True. An FDI file containing the sample data isn't likely to be /that/ much 
bigger though.

But it'll take a bloody age transferring that much data over USB2 Full Speed 
(1.2Mbps peak).

> So, what's an upper boundary on bits per sample? I suppose you could 
> have an entire track with just one sample on it right at the end, so 
> your upper boundary is the 16.67ms figure.

MFM is a (1,3)RLL code. That means you can have a minimum of 1 and a maximum 
of 3 empty bit cells between flux transitions.

Best-case is a stream of 1s, which is NT - NT - NT...
   (1 no-transition period between each transition)
Worst-case is a repeating 1-0 bit stream which is NT-NN - NT-NN..
   (3 no-transition periods between each transition)

You can also have 2 no-transition periods separating a pair of transitions -- 
that's a 1-0-0 sequence:  NT-NN-TN - NT-NN-TN...

> More typically though samples are going to occur *far* more frequently - 
> so to cope with all possible situations you (in theory) need a very 
> large sample length (which in 99.9999% of cases is going to contain a 
> lot of 0s in the upper bits!)

The RAM I'm using is 4Mbit -- i.e. 512Kbyte, but mapped into 16-bit words. So 
you have 262144 16-bit words. One status bit (for the index pulse) leaves 15 
bits for the counter. The counter is rigged to lock at 0x7FFF instead of 
rolling back to 0 (which saves a bit).

15 bit counter = span of 0 to 32767 counts.
40MHz = 25ns per count
25ns * 32767 = 819175 ns = 819.175 us

Hmm, for floppy discs you'd have to slow it down a bit (possibly to 20MHz, 
which would be 1638.35 us, or about 1.6 milliseconds), but for MFM (and RLL?) 
hard drives this should be more than enough.

> Some possible approaches I can see:
> 1) Just have an absolutely colossal buffer with a large number of bits 
> per sample

SRAM is relatively expensive though. SDRAM isn't, but then you have to deal 
with refreshing and other "fun" things like that.

> 2) Use the first bit (or bits) of each sample as a flag to indicate the 
> resolution of the following sample data (essentially toss away lower 
> bits for lengthy samples)

Hmm, use the top 2 bits to specify the timing resolution, and the rest as a 
timing value. Drop between 0 and 3 bits depending on how far over 14 bits your 
count has gone...

That's going on my "try this" list.

> Personally ISTR looking at the second and third options - the third is 
> probably the better of the two, but needs a bit more intelligence (and 
> at the time I was wondering if I could do this in pure logic without a 
> CPU). I'm not sure what approach the CW takes (but at floppy rates the 
> amount of buffer needed is a lot smaller, so maybe it just gets away 
> with a fixed sample size and large enough buffer).

It's a 7-bit counter bolted onto a 128Ksample RAM buffer. The top bit stores 
the state of the index pulse at the time the transition was stored, and the 
lower 7 bits store the timing value.

> For some reason when I looked at this I think I believed I could get 
> away with 9 bits per sample including a flag bit dictating two different 
> sample resolutions (i.e. somewhere around 1Mb of buffer) - unfortunately 
> all my notes on this are stuck in storage right now though (plus ideally 
> I'd go for a microcontroller approach with a variable sample length I 
> think)

Ultimately it depends on what you're sampling. A 3.5" DSDD floppy, sampled at 
7.08MHz ends up with few samples above ~64 counts (IIRC, my density graphs are 
on the laptop, which is switched off). Based on that, 7 bits would probably be 
fine up to 14MHz, 8 bits up to 28MHz.

Assuming, of course, that the disc is "perfectly" formatted. As in, the entire 
track is MFM.

> Recreating the data onto another drive is a different matter, of course 
> - but I've always been more interested in salvaging existing data onto 
> more modern media for analysis.

Half the problem is that the timing is going to get progressively worse the 
further down the line you go. Kinda like what happens when you dub an audio 
tape. By the time you've gotten to the point of having a 3rd- or 
4th-generation copy, the audio is almost completely masked by the background hiss.

classiccmp at

More information about the cctalk mailing list