Defining Data Classes and Refining Them

Let's develop a data representation for file systems using the method of iterative refinement. The first decision we need to make is what we wish to focus on and what we wish to ignore. Consider the directory tree in figure~#figaccountant#19005> and let's imagine how it is created. When a user first creates a directory, it is empty. As time goes by, the user adds files and directories. In general, a user refers to files by names but thinks of directories as containers of other things. <#19006#>Model 1<#19006#>: Our thought experiment suggests that our first, most primitive model should focus on files as atomic entities, say a symbol that represents a file's name, and on the directories' nature as containers. More concretely, we should think of a directory as just a list that contains files and directories. All of this suggests the following two data definitions:
A <#63947#><#19008#>file<#19008#><#63947#> is a symbol.
A <#63948#><#19011#>directory<#19011#><#63948#> (<#63949#><#19012#>dir<#19012#><#63949#>) is either
  1. <#63950#><#19014#>empty<#19014#><#63950#>;
  2. <#63951#><#19015#>(cons<#19015#>\ <#19016#>f<#19016#>\ <#19017#>d)<#19017#><#63951#> where <#63952#><#19018#>f<#19018#><#63952#> is a <#63953#><#19019#>file<#19019#><#63953#> and <#63954#><#19020#>d<#19020#><#63954#> is a <#63955#><#19021#>dir<#19021#><#63955#>; or
  3. <#63956#><#19022#>(cons<#19022#>\ <#19023#>d1<#19023#>\ <#19024#>d2)<#19024#><#63956#> where <#63957#><#19025#>d1<#19025#><#63957#> and <#63958#><#19026#>d2<#19026#><#63958#> are <#63959#><#19027#>dir<#19027#><#63959#>s.
The first data definition says that files are represented by their names. The second one captures how a directory is gradually <#63960#><#19030#>cons<#19030#><#63960#>tructed by adding files and directories. A closer look at the second data definition shows that the class of directories is the class of Web pages of section~#secdocs#19031>. Hence, we can reuse the template for Web-page processing functions to process directory trees. If we were to write a function that consume a directory (tree) and counts how many files are contained, it would be identical to a function that counts the number of words in a Web tree.
<#19034#>Exercise 16.2.1<#19034#> Translate the file system in figure~#figaccountant#19036> into a Scheme representation according to model~1.~ external Solution<#63961#><#63961#> <#19042#>Exercise 16.2.2<#19042#> Develop the function <#63962#><#19044#>how-many<#19044#><#63962#>, which consumes a <#63963#><#19045#>dir<#19045#><#63963#> and produces the number of files in the <#63964#><#19046#>dir<#19046#><#63964#> tree.~ external Solution<#63965#><#63965#>
<#19054#>Model 2<#19054#>: While the first data definition is familiar to us and easy to use, it obscures the nature of directories. In particular, it hides the fact that a directory is not just a collection of files and directories but has several interesting attributes. To model directories in a more faithful manner, we must introduce a structure that collects all relevant properties of a directory. Here is a minimal structure definition:
<#19059#>(define-struct<#19059#> <#19060#>dir<#19060#> <#19061#>(name<#19061#> <#19062#>content))<#19062#>
It suggests that a directory has a name and a content; other attributes can now be added as needed. The intention of the new definition is that a directory has two attributes: a name, which is a symbol, and a content, which is a list of files and directories. This, in turn, suggests the following data definitions:
A <#63966#><#19067#>directory<#19067#><#63966#> (<#63967#><#19068#>dir<#19068#><#63967#>) is a structure:

<#71117#><#63968#><#19069#>(make-dir<#19069#>\ <#19070#>n<#19070#>\ <#19071#>c)<#19071#><#63968#><#71117#> where <#63969#><#19072#>n<#19072#><#63969#> is a symbol and <#63970#><#19073#>c<#19073#><#63970#> is a list of files and directories.

A <#63971#><#19074#>list of files and directories<#19074#><#63971#> (<#63972#><#19075#>LOFD<#19075#><#63972#>) is either

  1. <#63973#><#19077#>empty<#19077#><#63973#>;
  2. <#63974#><#19078#>(cons<#19078#>\ <#19079#>f<#19079#>\ <#19080#>d)<#19080#><#63974#> where <#63975#><#19081#>f<#19081#><#63975#> is a file and <#63976#><#19082#>d<#19082#><#63976#> is a <#63977#><#19083#>LOFD<#19083#><#63977#>; or
  3. <#63978#><#19084#>(cons<#19084#>\ <#19085#>d1<#19085#>\ <#19086#>d2)<#19086#><#63978#> where <#63979#><#19087#>d1<#19087#><#63979#> is a <#63980#><#19088#>dir<#19088#><#63980#> and <#63981#><#19089#>d2<#19089#><#63981#> is a <#63982#><#19090#>LOFD<#19090#><#63982#>.
Since the data definition for <#63983#><#19093#>dir<#19093#><#63983#> refers to the definition for <#63984#><#19094#>LOFD<#19094#><#63984#>s, and the definition for <#63985#><#19095#>LOFD<#19095#><#63985#>s refers back to that of <#63986#><#19096#>dir<#19096#><#63986#>s, the two are mutually recursive definitions and must be introduced together. Roughly speaking, the two definitions are related like those of <#63987#><#19097#>parent<#19097#><#63987#> and <#63988#><#19098#>list-of-children<#19098#><#63988#> in section~#secmutrefdd#19099>. This, in turn, means that the design recipe for programming from section~#secmutualdesign#19100> directly apply to <#63989#><#19101#>dir<#19101#><#63989#>s and <#63990#><#19102#>LOFD<#19102#><#63990#>s. More concretely, to design a function that processes <#63991#><#19103#>dir<#19103#><#63991#>s, we must develop templates for <#63992#><#19104#>dir<#19104#><#63992#>-processing functions <#19105#>and<#19105#> <#63993#><#19106#>LOFD<#19106#><#63993#>-processing functions <#19107#>in parallel<#19107#>.
<#19110#>Exercise 16.2.3<#19110#> Show how to model a directory with two more attributes: a size and a systems attribute. The former measures how much space the directory itself (as opposed to its files and subdirectories) consumes; the latter specifies whether the directory is recognized by the operating system.~ external Solution<#63994#><#63994#> <#19117#>Exercise 16.2.4<#19117#> Translate the file system in figure~#figaccountant#19119> into a Scheme representation according to model~2.~ external Solution<#63995#><#63995#> <#19125#>Exercise 16.2.5<#19125#> Develop the function <#63996#><#19127#>how-many<#19127#><#63996#>, which consumes a <#63997#><#19128#>dir<#19128#><#63997#> according to model~2 and produces the number of files in the <#63998#><#19129#>dir<#19129#><#63998#> tree.~ external Solution<#63999#><#63999#>
<#19137#>Model 3<#19137#>: The second data definition refined the first one with the introduction of attributes for directories. Files also have attributes. To model those, we proceed just as above. First, we define a structure for files:
<#19142#>(define-struct<#19142#> <#19143#>file<#19143#> <#19144#>(name<#19144#> <#19145#>size<#19145#> <#19146#>content))<#19146#>
Second, we provide a data definition:
A <#64000#><#19151#>file<#19151#><#64000#> is a structure:

<#71118#><#64001#><#19152#>(make-file<#19152#>\ <#19153#>n<#19153#>\ <#19154#>s<#19154#>\ <#19155#>x)<#19155#><#64001#><#71118#> where <#64002#><#19156#>n<#19156#><#64002#> is a symbol, <#64003#><#19157#>s<#19157#><#64003#> is a number, and <#64004#><#19158#>x<#19158#><#64004#> is some Scheme value.

For now, we think of the <#64005#><#19160#>content<#19160#><#64005#> field of a file as set to <#64006#><#19161#>empty<#19161#><#64006#>. Later, we will discuss how to get access to the data in a file. Finally, let's split the <#64007#><#19162#>content<#19162#><#64007#> field of <#64008#><#19163#>dir<#19163#><#64008#>s into two pieces: one for a list of files and one for a list of subdirectories. The data definition for a list of files is straightforward and relies on nothing but the definition for <#64009#><#19164#>file<#19164#><#64009#>s:
A <#64010#><#19166#>list of files<#19166#><#64010#> is either
  1. <#64011#><#19168#>empty<#19168#><#64011#>, or
  2. <#64012#><#19169#>(cons<#19169#>\ <#19170#>s<#19170#>\ <#19171#>lof)<#19171#><#64012#> where <#64013#><#19172#>s<#19172#><#64013#> is a <#64014#><#19173#>file<#19173#><#64014#> and <#64015#><#19174#>lof<#19174#><#64015#> is a list of files.
In contrast, the data definitions for <#64016#><#19177#>dir<#19177#><#64016#>s and its list of subdirectories still refer to each other and must therefore be introduced together. Of course, we first need a structure definition for <#64017#><#19178#>dir<#19178#><#64017#>s that has a field for files and another one for subdirectories:
<#19183#>(define-struct<#19183#> <#19184#>dir<#19184#> <#19185#>(name<#19185#> <#19186#>dirs<#19186#> <#19187#>files))<#19187#>
Here are the data definitions:
A <#64018#><#19192#>dir<#19192#><#64018#> is a structure:

<#71119#><#64019#><#19193#>(make-dir<#19193#>\ <#19194#>n<#19194#>\ <#19195#>ds<#19195#>\ <#19196#>fs)<#19196#><#64019#><#71119#> where <#64020#><#19197#>n<#19197#><#64020#> is a symbol, <#64021#><#19198#>ds<#19198#><#64021#> is a list of directories, and <#64022#><#19199#>fs<#19199#><#64022#> is a list of files.

A <#19200#>list of directories<#19200#> is either

  1. <#64023#><#19202#>empty<#19202#><#64023#> or
  2. <#64024#><#19203#>(cons<#19203#>\ <#19204#>s<#19204#>\ <#19205#>lod)<#19205#><#64024#> where <#64025#><#19206#>s<#19206#><#64025#> is a <#64026#><#19207#>dir<#19207#><#64026#> and <#64027#><#19208#>lod<#19208#><#64027#> is a list of directories.
This third model (or data representation) of a directory hierarchy captures the nature of a file system as a user typically perceives it. With two structure definitions and four data definitions, it is, however, far more complicated than the first model. But, by starting with a the simple representation of the first model and refining it step by step, we have gained a good understanding of how to work with this complex web of classes. It is now our job to use the design recipe from section~#secmutualdesign#19211> for developing functions on this set of data definitions. Otherwise, we cannot hope to understand our functions at all.