Extended Exercise: Binary Search Trees, Part 1

Programmers often work with trees, though rarely with family trees. A particularly well-known form of tree is the <#16947#>binary search tree<#16947#>. Many applications employ binary search trees to store and to retrieve information. To be concrete, we discuss binary trees that manage information about people. In this context, a binary tree is similar to a family tree but instead of <#63421#><#16948#>child<#16948#><#63421#> structures it contains <#63422#><#16949#>node<#16949#><#63422#>s:
<#16954#>(define-struct<#16954#> <#16955#>node<#16955#> <#16956#>(ssn<#16956#> <#16957#>name<#16957#> <#16958#>left<#16958#> <#16959#>right))<#16959#>
Here we have decided to record the social security number, the name, and two other trees. The latter are like the parent fields of family trees, though the relationship between a <#63423#><#16963#>node<#16963#><#63423#> and its <#63424#><#16964#>left<#16964#><#63424#> and <#63425#><#16965#>right<#16965#><#63425#> trees is not based on family relationships. The corresponding data definition is just like the one for family trees:
A <#63426#><#16967#>binary tree<#16967#><#63426#> (<#63427#><#16968#>BT<#16968#><#63427#>) is either
  1. <#63428#><#16970#>false<#16970#><#63428#> or
  2. <#63429#><#16971#>(make-node<#16971#>\ <#16972#>soc<#16972#>\ <#16973#>pn<#16973#>\ <#16974#>lft<#16974#>\ <#16975#>rgt)<#16975#><#63429#>
    where <#63430#><#16976#>soc<#16976#><#63430#> is a number, <#63431#><#16977#>pn<#16977#><#63431#> is a symbol, and <#63432#><#16978#>lft<#16978#><#63432#> and <#63433#><#16979#>rgt<#16979#><#63433#> are <#63434#><#16980#>BT<#16980#><#63434#>s.
The choice of <#63435#><#16983#>false<#16983#><#63435#> to indicate lack of information is arbitrary. We could have chosen <#63436#><#16984#>empty<#16984#><#63436#> again, but <#63437#><#16985#>false<#16985#><#63437#> is an equally good and an equally frequent choice that we should become familiar with. Here are two binary trees:

   <#16989#>(m<#16989#><#16990#>ake-node<#16990#>
     <#16991#>15<#16991#> 
     <#16992#>'<#16992#><#16993#>d<#16993#> 
     <#16994#>false<#16994#> 
     <#16995#>(make-node<#16995#> <#16996#>24<#16996#> <#16997#>'<#16997#><#16998#>i<#16998#> <#16999#>false<#16999#> <#17000#>false<#17000#><#17001#>))<#17001#> 
~~
<#17007#>(m<#17007#><#17008#>ake-node<#17008#>
  <#17009#>15<#17009#> 
  <#17010#>'<#17010#><#17011#>d<#17011#> 
  <#17012#>(make-node<#17012#> <#17013#>87<#17013#> <#17014#>'<#17014#><#17015#>h<#17015#> <#17016#>false<#17016#> <#17017#>false<#17017#><#17018#>)<#17018#> 
  <#17019#>false<#17019#><#17020#>)<#17020#> 

Figure~#figbst#17023> shows how we should think about such trees. The trees are drawn upside down, that is, with the root at the top and the crown of the tree at the bottom. Each circle corresponds to a node, labeled with the <#63438#><#17024#>ssn<#17024#><#63438#> field of a corresponding <#63439#><#17025#>node<#17025#><#63439#> structure. The trees omit <#63440#><#17026#>false<#17026#><#63440#>.
<#17029#>Exercise 14.2.1<#17029#> Draw the two trees above in the manner of figure~#figbst#17032>. Develop <#63441#><#17033#>contains-bt<#17033#><#63441#>. The function consumes a number and a <#63442#><#17034#>BT<#17034#><#63442#> and determines whether the number occurs in the tree. Develop <#63443#><#17035#>search-bt<#17035#><#63443#>. The function consumes a number <#63444#><#17036#>n<#17036#><#63444#> and a <#63445#><#17037#>BT<#17037#><#63445#>. If the tree contains a <#63446#><#17038#>node<#17038#><#63446#> structure whose <#63447#><#17039#>soc<#17039#><#63447#> field is <#63448#><#17040#>n<#17040#><#63448#>, the function produces the value of the <#63449#><#17041#>pn<#17041#><#63449#> field in that node. Otherwise, the function produces <#63450#><#17042#>false<#17042#><#63450#>. <#17043#>Hint:<#17043#> \ Use <#63451#><#17044#>contains-bt<#17044#><#63451#> to determine whether to search in a subtree of the given binary tree.~ external Solution<#63452#><#63452#>


rawhtml25 ~

<#17052#>Figure: A binary search tree and a binary tree<#17052#>


Both trees in figure~#figbst#17054> are binary trees but they differ in a significant way. If we read the numbers in the two trees from left to right we obtain two sequences:

#displaymath73066#

The sequence for tree A is sorted in ascending order, the one for B is not. A binary tree that has an ordered sequence of information is a <#63453#><#17060#>BINARY SEARCH TREE<#17060#><#63453#>. Every binary search tree is a binary tree, but not every binary tree is a binary search tree. We say that the class of binary search trees is a <#63454#><#17061#>PROPER SUBCLASS<#17061#><#63454#> of that of binary trees. To define the class of binary search trees rigorously, we formulate a condition that distinguishes a binary search tree from a binary tree:

<#17064#>The BST Invariant<#17064#> A <#63455#><#17065#>binary search tree<#17065#><#63455#> (<#63456#><#17066#>BST<#17066#><#63456#>) is a <#63457#><#17067#>BT<#17067#><#63457#>:

  1. <#63458#><#17069#>false<#17069#><#63458#> is always a <#63459#><#17070#>BST<#17070#><#63459#>;
  2. <#63460#><#17071#>(make-node<#17071#>\ <#17072#>soc<#17072#>\ <#17073#>pn<#17073#>\ <#17074#>lft<#17074#>\ <#17075#>rgt)<#17075#><#63460#> is a <#63461#><#17076#>BST<#17076#><#63461#> if
    1. <#63462#><#17078#>lft<#17078#><#63462#> and <#63463#><#17079#>rgt<#17079#><#63463#> are <#63464#><#17080#>BST<#17080#><#63464#>s,
    2. all <#63465#><#17081#>ssn<#17081#><#63465#> numbers in <#63466#><#17082#>lft<#17082#><#63466#> are smaller than <#63467#><#17083#>soc<#17083#><#63467#>, and
    3. all <#63468#><#17084#>ssn<#17084#><#63468#> numbers in <#63469#><#17085#>rgt<#17085#><#63469#> are larger than <#63470#><#17086#>soc<#17086#><#63470#>.
The second and third condition are different from what we have seen in previous data definitions. They place an additional and unusual burden on the construction <#63471#><#17091#>BST<#17091#><#63471#>s. We must inspect all numbers in these trees and ensure that they are smaller (or larger) than <#63472#><#17092#>soc<#17092#><#63472#>.
<#17095#>Exercise 14.2.2<#17095#> Develop the function <#63473#><#17097#>inorder<#17097#><#63473#>. It consumes a binary tree and produces a list of all the <#63474#><#17098#>ssn<#17098#><#63474#> numbers in the tree. The list contains the numbers in the left-to-right order we have used above. <#17099#>Hint:<#17099#> \ Use the Scheme operation <#63475#><#17100#>append<#17100#><#63475#>, which concatenates lists. Here is an example:
  <#17105#>(append<#17105#> <#17106#>(list<#17106#> <#17107#>1<#17107#> <#17108#>2<#17108#> <#17109#>3)<#17109#> <#17110#>(list<#17110#> <#17111#>4)<#17111#> <#17112#>(list<#17112#> <#17113#>5<#17113#> <#17114#>6<#17114#> <#17115#>7))<#17115#>
<#17116#>=<#17116#> <#17117#>(list<#17117#> <#17118#>1<#17118#> <#17119#>2<#17119#> <#17120#>3<#17120#> <#17121#>4<#17121#> <#17122#>5<#17122#> <#17123#>6<#17123#> <#17124#>7)<#17124#> 
What does <#63476#><#17128#>inorder<#17128#><#63476#> produce for a binary search tree?~ external Solution<#63477#><#63477#>
Looking for a specific <#63478#><#17136#>node<#17136#><#63478#> in a <#63479#><#17137#>BST<#17137#><#63479#> takes fewer steps than looking for the same <#63480#><#17138#>node<#17138#><#63480#> in a <#63481#><#17139#>BT<#17139#><#63481#>. To find out whether a <#63482#><#17140#>BT<#17140#><#63482#> contains a node with a specific <#63483#><#17141#>ssn<#17141#><#63483#> field, a function may have to look at every <#63484#><#17142#>node<#17142#><#63484#> of the tree. In contrast, to inspect a binary search tree requires far fewer inspections than that. Suppose we are given the <#63485#><#17143#>BST<#17143#><#63485#>:
<#17148#>(make-node<#17148#> <#17149#>66<#17149#> <#17150#>'<#17150#><#17151#>a<#17151#> <#17152#>L<#17152#> <#17153#>R)<#17153#>
If we are looking for <#63486#><#17157#>66<#17157#><#63486#>, we have found it. Now suppose we are looking for <#63487#><#17158#>63<#17158#><#63487#>. Given the above <#63488#><#17159#>node<#17159#><#63488#>, we can focus the search on <#63489#><#17160#>L<#17160#><#63489#> because <#17161#>all<#17161#> <#63490#><#17162#>node<#17162#><#63490#>s with <#63491#><#17163#>ssn<#17163#><#63491#>s smaller than <#63492#><#17164#>65<#17164#><#63492#> are in <#63493#><#17165#>L<#17165#><#63493#>. Similarly, if we were to look for <#63494#><#17166#>99<#17166#><#63494#>, we would ignore <#63495#><#17167#>L<#17167#><#63495#> and focus on <#63496#><#17168#>R<#17168#><#63496#> because <#17169#>all<#17169#> <#63497#><#17170#>node<#17170#><#63497#>s with <#63498#><#17171#>ssn<#17171#><#63498#>s larger than <#63499#><#17172#>65<#17172#><#63499#> are in <#63500#><#17173#>R<#17173#><#63500#>.
<#17176#>Exercise 14.2.3<#17176#> Develop <#63501#><#17178#>search-bst<#17178#><#63501#>. The function consumes a number <#63502#><#17179#>n<#17179#><#63502#> and a <#63503#><#17180#>BST<#17180#><#63503#>. If the tree contains a <#63504#><#17181#>node<#17181#><#63504#> structure whose <#63505#><#17182#>soc<#17182#><#63505#> field is <#63506#><#17183#>n<#17183#><#63506#>, the function produces the value of the <#63507#><#17184#>pn<#17184#><#63507#> field in that node. Otherwise, the function produces <#63508#><#17185#>false<#17185#><#63508#>. The function organization must exploit the <#17186#>BST Invariant<#17186#> so that the function performs as few comparisons as necessary. Compare searching in binary search trees with searching in sorted lists (exercise~#exsort2#17187>).~ external Solution<#63509#><#63509#>
Building a binary tree is easy; building a binary search trees is a complicated, error-prone affair. To create a <#63510#><#17195#>BT<#17195#><#63510#> we combine two <#63511#><#17196#>BT<#17196#><#63511#>s, a <#63512#><#17197#>ssn<#17197#><#63512#> number and a <#63513#><#17198#>name<#17198#><#63513#> with <#63514#><#17199#>make-node<#17199#><#63514#>. The result is, by definition, a <#63515#><#17200#>BT<#17200#><#63515#>. To create a <#63516#><#17201#>BST<#17201#><#63516#>, this procedure fails because the result would typically not be a <#63517#><#17202#>BST<#17202#><#63517#>. For example, if one tree contains <#63518#><#17203#>3<#17203#><#63518#> and <#63519#><#17204#>5<#17204#><#63519#>, and the other one contains <#63520#><#17205#>2<#17205#><#63520#> and <#63521#><#17206#>6<#17206#><#63521#>, there is no way to join these two <#63522#><#17207#>BST<#17207#><#63522#>s into a single binary search tree. We can overcome this problem in (at least) two different ways. First, given a list of numbers and symbols, we can determine by hand what the corresponding <#63523#><#17208#>BST<#17208#><#63523#> should look like and then use <#63524#><#17209#>make-node<#17209#><#63524#> to build it. Second, we can write a function that builds a <#63525#><#17210#>BST<#17210#><#63525#> from the list, one <#63526#><#17211#>node<#17211#><#63526#> after another.
<#17214#>Exercise 14.2.4<#17214#> Develop the function <#63527#><#17216#>create-bst<#17216#><#63527#>. It consumes a <#63528#><#17217#>BST<#17217#><#63528#> <#63529#><#17218#>B<#17218#><#63529#>, a number <#63530#><#17219#>N<#17219#><#63530#>, and a symbol <#63531#><#17220#>S<#17220#><#63531#>. It produces a <#63532#><#17221#>BST<#17221#><#63532#> that is just like <#63533#><#17222#>B<#17222#><#63533#> and that in place of one <#63534#><#17223#>false<#17223#><#63534#> subtree contains the <#63535#><#17224#>node<#17224#><#63535#> structure
<#17229#>(make-node<#17229#> <#17230#>N<#17230#> <#17231#>S<#17231#> <#17232#>false<#17232#> <#17233#>false<#17233#><#17234#>)<#17234#>
Test the function with <#63536#><#17238#>(create-bst<#17238#>\ <#17239#>false<#17239#>\ <#17240#>66<#17240#>\ <#17241#>'<#17241#><#17242#>a)<#17242#><#63536#>; this should create a single <#63537#><#17243#>node<#17243#><#63537#>. Then show that the following holds:
  <#17248#>(create-bst<#17248#> <#17249#>(create-bst<#17249#> <#17250#>false<#17250#> <#17251#>66<#17251#> <#17252#>'<#17252#><#17253#>a)<#17253#> <#17254#>53<#17254#> <#17255#>'<#17255#><#17256#>b)<#17256#>
<#17257#>=<#17257#> <#17258#>(make-node<#17258#> <#17259#>66<#17259#> 
             <#17260#>'<#17260#><#17261#>a<#17261#> 
             <#17262#>(make-node<#17262#> <#17263#>53<#17263#> <#17264#>'<#17264#><#17265#>b<#17265#> <#17266#>false<#17266#> <#17267#>false<#17267#><#17268#>)<#17268#> 
             <#17269#>false<#17269#><#17270#>)<#17270#> 
Finally, create tree A from figure~#figbst#17274> using <#63538#><#17275#>create-bst<#17275#><#63538#>.~ external Solution<#63539#><#63539#> <#17281#>Exercise 14.2.5<#17281#> Develop the function <#63540#><#17283#>create-bst-from-list<#17283#><#63540#>. It consumes a list of numbers and names; it produces a <#63541#><#17284#>BST<#17284#><#63541#> by repeatedly applying <#63542#><#17285#>create-bst<#17285#><#63542#>. The data definition for a list of numbers and names is as follows:
A <#63543#><#17287#>list of numbers and names<#17287#><#63543#> (<#63544#><#17288#>list-of-number/name<#17288#><#63544#>) is either
  1. <#63545#><#17290#>empty<#17290#><#63545#> or
  2. <#63546#><#17291#>(cons<#17291#>\ <#17292#>(list<#17292#>\ <#17293#>ssn<#17293#>\ <#17294#>nom)<#17294#>\ <#17295#>lonn)<#17295#><#63546#>
    where <#63547#><#17296#>ssn<#17296#><#63547#> is a number, <#63548#><#17297#>nom<#17297#><#63548#> a symbol,
    and <#63549#><#17298#>lonn<#17298#><#63549#> is a <#63550#><#17299#>list-of-number/name<#17299#><#63550#>.
Consider the following examples:
<#17306#>(d<#17306#><#17307#>efine<#17307#> <#17308#>sample<#17308#>
  <#17309#>'<#17309#><#17310#>(<#17310#><#17311#>(99<#17311#> <#17312#>o)<#17312#> 
    <#17313#>(77<#17313#> <#17314#>l)<#17314#> 
    <#17315#>(24<#17315#> <#17316#>i)<#17316#> 
    <#17317#>(10<#17317#> <#17318#>h)<#17318#> 
    <#17319#>(95<#17319#> <#17320#>g)<#17320#> 
    <#17321#>(15<#17321#> <#17322#>d)<#17322#> 
    <#17323#>(89<#17323#> <#17324#>c)<#17324#> 
    <#17325#>(29<#17325#> <#17326#>b)<#17326#> 
    <#17327#>(63<#17327#> <#17328#>a)))<#17328#> 
<#17334#>(d<#17334#><#17335#>efine<#17335#> <#17336#>sample<#17336#>
  <#17337#>(list<#17337#> <#17338#>(list<#17338#> <#17339#>99<#17339#> <#17340#>'<#17340#><#17341#>o)<#17341#> 
        <#17342#>(list<#17342#> <#17343#>77<#17343#> <#17344#>'<#17344#><#17345#>l)<#17345#> 
        <#17346#>(list<#17346#> <#17347#>24<#17347#> <#17348#>'<#17348#><#17349#>i)<#17349#> 
        <#17350#>(list<#17350#> <#17351#>10<#17351#> <#17352#>'<#17352#><#17353#>h)<#17353#> 
        <#17354#>(list<#17354#> <#17355#>95<#17355#> <#17356#>'<#17356#><#17357#>g)<#17357#> 
        <#17358#>(list<#17358#> <#17359#>15<#17359#> <#17360#>'<#17360#><#17361#>d)<#17361#> 
        <#17362#>(list<#17362#> <#17363#>89<#17363#> <#17364#>'<#17364#><#17365#>c)<#17365#> 
        <#17366#>(list<#17366#> <#17367#>29<#17367#> <#17368#>'<#17368#><#17369#>b)<#17369#> 
        <#17370#>(list<#17370#> <#17371#>63<#17371#> <#17372#>'<#17372#><#17373#>a)))<#17373#> 
They are equivalent, although the left one is defined with the quote abbreviation, the right one using <#63551#><#17377#>list<#17377#><#63551#>. The left tree in figure~#figbst#17378> is the result of using <#63552#><#17379#>create-bst-from-list<#17379#><#63552#> on this list.~ external Solution<#63553#><#63553#>