DFS - The Distributed File System

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What is it?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

DFS is a fully distributed file system.  Unlike most network file systems
which operate in a client/server model where one or more servers keep the
master copy and the clients just access it, DFS in actually a peer-to-peer
file system with the ability to handle disconnected operation.  It provides
the accessibility of a network file system, the speed of a local hard drive,
the redundancy of mirroring and the scalability of a RAID.

At least...  That's what it _will_ be.  Right now, it's just a prototype and
proof of concept with really only the speed and redundancy being supported to
some degree.  But looking in to the future...

DFS accomplishes this by registering itself with the operating system to
process file system requests.  When a file is created or modified, it instantly
informs all DFS machines of the change and opens itself to send the new
version to others.  By keeping a local copy we get speed and the ability to
operate when disconnected from the network; by sending copies to other
machines we get redundancy.

The real key to DFS, though, is that not all machines will keep copies of all
files.  Each machine will keep the files it uses the most and perhaps a few
others, leaving the remaining files to be stored on other machines but always
ensuring that any given file is held by a minimum number of hosts.  It is this
feature that sets DFS apart from similar schemes such Coda and InterMezzo.  If
a request is made by an application for a file not held locally, DFS will
automatically and transparently fetch that file from the nearest host that
does have a copy.

By not requiring each host to have a complete copy of the filesystem, any
number of hosts past X (where X is the minimum number of copies the system
will keep of any given file) will approximate a linear addition to the total
file system space.  It also means that ordinary "client" machines, with disks
typically much smaller than those in "server" machines, can act as peers and
contribute to the whole instead of just using their disk as a cache of what is
stored on the network.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Why was it written?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The idea of DFS was born somewhere in the mid 90's for reasons I can no longer
recall but never had any work done on it beyond writing down some of the
requirements.  Then, some 10 years later, while working on how to expand my
company's network to multiple subnets with shared files, the idea arose
again.  None of NFS, Coda, and InterMezzo had the scalability I wanted while
OceanStore was far, far beyond what I wanted.  So being a sucker for
punishment (me being Brian White, by the way), I took many of my free hours
see if I could turn the idea in to a reality.

SourceForge.net