"File types"

Don THX1138 at dakotacom.net
Tue Aug 29 14:57:58 CDT 2006


Sean Conner wrote:
> It was thus said that the Great Don once stated:
>> Even in the limit case (i.e. only one application exists with which you
>> could use to "open" *all* JPEGs), is there still any reason why you
>> need to name a file:
>>
>> PictureOfUsSkiing.<photograph>
>>
>> Granted, you might have:
>>
>> Skiing.<photograph>           // i.e. photo
>> Skiing.<expense_tabulation>   // i.e. spreadsheet
>> Skiing.<test>                 // i.e. invitation to ski trip
>> ...
>>
>> But, they could still all be *called* "Skiing" -- with some
>> other attribute (e.g. file creator) that actually differentiates
>> them.
> 
>   Heh.  About ten years ago I got into a similar discussion with some
> friends about this, and even designed a file system that not only could
> support user added metadata, but even the concept of a "name" was fluid (you
> could actually use an audio clip as the "name", or a graphic).  It would
> also allow one to "cd" into a jpeg (for instance) and see all the segments
> that make up a JPEG file (once you concede that a directory is nothing more
> than a special file that "points" or "contains" other files, then this type
> of stuff just kind of falls out) or even "cd" into an executable and see the
> code and data segments (which means no special tools required to support
> "fat" binaries, and if you want to strip out the 68K code portions because
> you're on an x86 platform, you can use the regular delete command from the
> shell, stuff like that).

Excellent!  In my case, directories are active objects
and the "file system" is really more appropriately called
the "name space".  Since directories are active, the operations
that can be applied to a directory (i.e. that the directory
applies to *itself*) can be unconventional.  And, the objects
that it can "contain" (reference?) can be quite varied.
E.g., some objects may be "volatile", others "static", etc.

>   But how would I copy a file "named" <FX of screetching tires> to some
> other system, like Unix?  It's a source file containing C code, but the
> metadata includes the latest version number, which project it belongs to,
> the owner (me), and an extensive list of changes to the file since it was
> first created.  

This would be the problem of the "export method".  E.g., how does
a digital camera copy it's files to your $computer?  In the
camera, there is no notion of JPEG, TIFF, etc.  They are just
values from a CCD stored in memory in some convenient order
for the hardware to generate and process.  Obviously (?), the
data isn't represented internally *as* a JPEG *before* image
processing is done (e.g., jitter reduction, color compensation,
etc.).  Rather, JPEG is just "an acceptable way" of exporting
the data (rather than some other oddball format that might
require the user to run some "converter" on the data prior
to use).

>> So, I'm still wondering *why* this came to be (hence the
>> historical reference)
> 
> Probably had something to do with the popularity of Unix and MS-DOS.  Unix
> started out treating files as a stream (or bag) of bytes---no structure was
> implied or enforced by the operating system and from what I understand, at
> the time that was pretty revolutionary.  I'm also guessing that at the time,
> you really only had three different types of files (excluding the special
> device files)---executables, object code and text files.  And even *if* you

But, even "text files" have different types.  E.g., shell scripts
are "different" than "ascii text".  (and this is determined by
inspecting the file's *contents*, not *name*!)

> wanted to waste some disk space on tracking file type information, how much
> space do you set aside in the inode for such information?  (my guess is that
> at the time, the creators of Unix didn't think such information was all that
> important and besides, with 14 character file names, why not just let
> convention win and stick the "type" as an extension?)

But you're already using 2 - 4 bytes in the inode to track this
type!  I.e., '.' followed by 1 - 3 (or more) characters of extension.
And, for all practical purposes, you are poorly utilizing those
4 bytes!  The first conveys no information other than "an extension
follows".  And, of the 1 - 3 (typical) that follow, you really only
see 36^3 different file "types" (i.e. case neutral, and typically
only alphanumerics).  So, you're storing 46656 data types in a field
that could store 4294967296.  (i.e. you could use 2 bytes instead
of the 4 you are using)

>   Now, shuffle over to CP/M (precursor to MS-DOS) and there, the three
> letter extension *is* the file type---in all the documentation I've read
> about CP/M, the three character extension is used to designate the file
> type (and said extention restricted to the letters 'A' through 'Z' (and
> any trailing space)).  It was a separate field from the name (which was
> eight characters long if I recall).  MS-DOS picked up on this, and yes, if
> you check the documentation for MS-DOS, the three letter extention is again
> a separate field in the directory listing, and it followed the same
> restrictions as CP/M.  

But CP/M didn't *do* anything with the file extension (for all
practical purposes).  I.e. you could have a test file called
foo.foo and your text editor would gladly open it.  If an
*application* wanted to insist on particular file extensions
then it could do so -- usually pissing you off in the process
("No, Editor, I am writing an INCLUDE file, I want to name foo.inc
*not* foo.doc")

>   So MS-DOS *does* store file type information in the directory entry
> (however restricted it is).  
> 
>   Now, this metainformation about the file is easy to carry across between 
> systems (like Unix, and even the Macintosh) if you tack on the extension as
> part of the "name" of the file.  So, if you move a file "vacation.jpg" from
> Unix (which doesn't care what type of file it is) to MS-DOS, automagically
> it gets the *type* when copied (JPEG image file).  

This, IMO, is the only "real" reason that file type has migrated
into the namespace.  It lets everyone avoid the issue of
moving files between systems by simply stating that files are
*just* bytes -- even if your applications think otherwise.
I.e. if your MS machine doesn't know what .HQX means, <shrug>.

>   It's harder if this metainformation is stored as something else (like the
> 4 byte type field on the Macintosh---not sure what it's called exactly). 
> So, on a Mac you have the file "vacation" but the type is (as an example)
> 0x4A504547.  It's meaningless on Unix, and it's a value that won't fit into
> the MS-DOS extension field.  
> 
>   BeOS (on topic actually) had a very cool system whereby the user could
> attach arbitrary meta information to a file, and the file types were stored
> as MIME types (so "vacation" would have a type of "image/jpeg").  But again,

It still doesn't address the issue raised by an earlier respondant
(i.e. tagging files for special treatment on open) but that could
be done by fabricating your own file type.

> how can one transfer user added metadata of a file to another system?

This would be worth looking into.  Does BeOS run on "special"
hardware?

>   -spc (So it just kind of evolved that the file type is tacked on to the
> 	end of the file name.)


More information about the cctalk mailing list