"File types"

Sean Conner spc at conman.org
Tue Aug 29 11:36:15 CDT 2006

It was thus said that the Great Don once stated:
> Even in the limit case (i.e. only one application exists with which you
> could use to "open" *all* JPEGs), is there still any reason why you
> need to name a file:
> PictureOfUsSkiing.<photograph>
> Granted, you might have:
> Skiing.<photograph>           // i.e. photo
> Skiing.<expense_tabulation>   // i.e. spreadsheet
> Skiing.<test>                 // i.e. invitation to ski trip
> ...
> But, they could still all be *called* "Skiing" -- with some
> other attribute (e.g. file creator) that actually differentiates
> them.

  Heh.  About ten years ago I got into a similar discussion with some
friends about this, and even designed a file system that not only could
support user added metadata, but even the concept of a "name" was fluid (you
could actually use an audio clip as the "name", or a graphic).  It would
also allow one to "cd" into a jpeg (for instance) and see all the segments
that make up a JPEG file (once you concede that a directory is nothing more
than a special file that "points" or "contains" other files, then this type
of stuff just kind of falls out) or even "cd" into an executable and see the
code and data segments (which means no special tools required to support
"fat" binaries, and if you want to strip out the 68K code portions because
you're on an x86 platform, you can use the regular delete command from the
shell, stuff like that).

  But how would I copy a file "named" <FX of screetching tires> to some
other system, like Unix?  It's a source file containing C code, but the
metadata includes the latest version number, which project it belongs to,
the owner (me), and an extensive list of changes to the file since it was
first created.  

> So, I'm still wondering *why* this came to be (hence the
> historical reference)

  Probably had something to do with the popularity of Unix and MS-DOS.  Unix
started out treating files as a stream (or bag) of bytes---no structure was
implied or enforced by the operating system and from what I understand, at
the time that was pretty revolutionary.  I'm also guessing that at the time,
you really only had three different types of files (excluding the special
device files)---executables, object code and text files.  And even *if* you
wanted to waste some disk space on tracking file type information, how much
space do you set aside in the inode for such information?  (my guess is that
at the time, the creators of Unix didn't think such information was all that
important and besides, with 14 character file names, why not just let
convention win and stick the "type" as an extension?)

  Now, shuffle over to CP/M (precursor to MS-DOS) and there, the three
letter extension *is* the file type---in all the documentation I've read
about CP/M, the three character extension is used to designate the file
type (and said extention restricted to the letters 'A' through 'Z' (and
any trailing space)).  It was a separate field from the name (which was
eight characters long if I recall).  MS-DOS picked up on this, and yes, if
you check the documentation for MS-DOS, the three letter extention is again
a separate field in the directory listing, and it followed the same
restrictions as CP/M.  

  So MS-DOS *does* store file type information in the directory entry
(however restricted it is).  

  Now, this metainformation about the file is easy to carry across between 
systems (like Unix, and even the Macintosh) if you tack on the extension as
part of the "name" of the file.  So, if you move a file "vacation.jpg" from
Unix (which doesn't care what type of file it is) to MS-DOS, automagically
it gets the *type* when copied (JPEG image file).  

  It's harder if this metainformation is stored as something else (like the
4 byte type field on the Macintosh---not sure what it's called exactly). 
So, on a Mac you have the file "vacation" but the type is (as an example)
0x4A504547.  It's meaningless on Unix, and it's a value that won't fit into
the MS-DOS extension field.  

  BeOS (on topic actually) had a very cool system whereby the user could
attach arbitrary meta information to a file, and the file types were stored
as MIME types (so "vacation" would have a type of "image/jpeg").  But again,
how can one transfer user added metadata of a file to another system?

  -spc (So it just kind of evolved that the file type is tacked on to the
	end of the file name.)

More information about the cctalk mailing list