Extended Exercise: More on Web Pages

With mutually-referential data definitions we can represent Web pages in a better manner than in section~#secdocs#18770>. Here is the basic structure definition:
<#18775#>(define-structure<#18775#> <#18776#>wp<#18776#> <#18777#>(header<#18777#> <#18778#>body))<#18778#>
The two fields, <#63830#><#18782#>header<#18782#><#63830#> and <#63831#><#18783#>body<#18783#><#63831#>, contain the two essential pieces of data in a Web page: a header (title) and a body. The data definition specifies that a body is a list of words and Web pages:
A <#63832#><#18785#>Web page<#18785#><#63832#> is a structure:

<#71103#><#63833#><#18786#>(make-wp<#18786#>\ <#18787#>h<#18787#>\ <#18788#>p)<#18788#><#63833#><#71103#> where <#63834#><#18789#>h<#18789#><#63834#> is a symbol and <#63835#><#18790#>p<#18790#><#63835#> is a (Web) document.

A (<#18791#>Web<#18791#>) <#63836#><#18792#>document<#18792#><#63836#> is either

  1. <#63837#><#18794#>empty<#18794#><#63837#>,
  2. <#63838#><#18795#>(cons<#18795#>\ <#18796#>s<#18796#>\ <#18797#>p)<#18797#><#63838#>
    where <#63839#><#18798#>s<#18798#><#63839#> is a symbol and <#63840#><#18799#>p<#18799#><#63840#> is a document, or
  3. <#63841#><#18800#>(cons<#18800#>\ <#18801#>w<#18801#>\ <#18802#>p)<#18802#><#63841#>
    where <#63842#><#18803#>w<#18803#><#63842#> is a Web page and <#63843#><#18804#>p<#18804#><#63843#> is a document.

<#18809#>Exercise 15.3.1<#18809#> Develop the function <#63844#><#18811#>size<#18811#><#63844#>, which consumes a Web page and produces the number of symbols (words) it contains. external Solution<#63845#><#63845#> <#18817#>Exercise 15.3.2<#18817#> external ~<#71104#>Advanced students may wish to develop an alternative version of the function that produces lists of symbols containing the following URL tags: <#63846#><#18820#>'<#18820#><#18821#>;SPMlt;html;SPMgt;<#18821#><#63846#>, <#63847#><#18822#>'<#18822#><#18823#>;SPMlt;/html;SPMgt;<#18823#><#63847#>, <#63848#><#18824#>'<#18824#><#18825#>;SPMlt;head;SPMgt;<#18825#><#63848#>, <#63849#><#18826#>'<#18826#><#18827#>;SPMlt;/head;SPMgt;<#18827#><#63849#>, <#63850#><#18828#>'<#18828#><#18829#>;SPMlt;body;SPMgt;<#18829#><#63850#>, <#63851#><#18830#>'<#18830#><#18831#>;SPMlt;/body;SPMgt;<#18831#><#63851#>, <#63852#><#18832#>'<#18832#><#18833#>;SPMlt;a<#18833#>\ <#18834#>href=;SPMgt;<#18834#><#63852#>, and <#63853#><#18835#>'<#18835#><#18836#>;SPMlt;/a;SPMgt;<#18836#><#63853#>.<#71104#> Develop the function <#63854#><#18837#>wp-to-file<#18837#><#63854#>. The function consumes a Web page and produces a list of symbols. The list contains all the words in a body and all the headers of embedded Web pages. The bodies of immediately embedded Web pages are ignored.~ external Solution<#63855#><#63855#> <#18843#>Exercise 15.3.3<#18843#> Develop the function <#63856#><#18845#>occurs<#18845#><#63856#>. It consumes a symbol and a Web page and determines whether the former occurs anywhere in the latter, including the embedded Web pages.~ external Solution<#63857#><#63857#> <#18851#>Exercise 15.3.4<#18851#> Develop the function <#63858#><#18853#>find<#18853#><#63858#>. The function consumes a Web page and a symbol. It produces <#63859#><#18854#>false<#18854#><#63859#>, if the symbol does not occur in the body of the page or its embedded Web pages. If the symbol occurs at least once, it produces a list of the headers that are encountered on the way to the symbol. external Solution<#63860#><#63860#>