2.1.4 Immutable Sequence Representations

Next: 2.1.5 Mutable Sequences Up: 2.1 Sequences Previous: 2.1.3 Immutable Sequences

2.1.4 Immutable Sequence Representations

There are two widely used representations for immutable sequences: linked and contiguous.

In the linked representation, a sequenceis a pointer to an object, which is either an empty node, representing the empty sequence, or a cons node with a field of type T containing the first element of the sequences and a field of type Seq containing a pointer to the first node in the rest of the sequence. This data representation, which is often called a linked list, directly corresponds to the standard inductive definition of sequences. We used this representation to implement the DeptDirectory data type. The following composite class hierarchy (with method definitions omitted) defines a linked list representations for sequences of type T in Java:

abstract class List implements Seq {
  public Seq empty() { return Empty.only; }
  public Seq cons(T head, List tail)  { return new Cons(head, tail); }
  public abstract T eltAt(int i);
  ...
}

class Empty extends List {
  static final Empty only = new Empty();
  private Empty() {}  // ensure that only is unique instance of Empty
  public T first() { throw new
    NoSuchElementException("first() invoked on the Empty sequence");
  }
  public Seq rest() { throw new
    NoSuchElementException("rest() invoked on the Empty sequence");
  }
  public T eltAt(int i) { throw new 
    NoSuchElementException("eltAt(" + i + 
                           ") invoked on the Empty sequence");
  }
  public boolean isEmpty() { return true; }
  public Object apply(Visitor v) { return v.forEmpty(this); }
}

class Cons extends List {
  private final T first;
  private final List rest;
  Cons(T f, List r) {
    first = f;
    rest = r;
  }
  public T first() { return first; }
  public Seq rest() { return rest; }
  public boolean isEmpty() { return false; }
  public T eltAt(int i) { 
    if (0 == i) return first;
    else return rest.eltAt(i-1);
  }
  public Object apply(Visitor v) { return v.forCons(this); }
}

The following figure shows a picture of linked list of integers.

$\includegraphics{gif/linked-list.ps}$

The nodes with two fields are Cons instances, and the crossed-box is an Empty instance. Pointers are represented by the heavy arrows. The pointer fields in the cells are in fact memory addresses. In Java, these addresses are always interpreted as references to objects. Java only supports operations on pointers that are consistent with this abstraction, e.g. you cannot perform arithmetic on a pointer. In lower-level languages like C and C++, pointers (addresses) can be manipulated like ordinary integers.

In the contiguous representation, a sequence is represented by a pointer to an immutable array of fields of type T. An immutable array is an array that, once initialized, is never modified. Java doesn't directly support immutable arrays, but a Java program can enforce immutability by defining a wrapper class for arrays (akin to the Integer class for wrapping ints in objects) with a single private field holding the embedded array and a collection of public methods that do not mutate this field. A lighter weight but less robust protocol for supporting immutable arrays is to use comments to indicate which arrays are immutable and to follow the discipline of never modifying arrays documented as immutable. In either case, a new array object generally must be created whenever an element of the represented sequence is changed, added, or removed. Creating an array object is a costly operation proportional in time and space to the number of elements in the new array.

In the linked representation of sequences, every operation in the collection listed above except eltAt can be performed in constant time. On the other hand, the eltAt operation takes time proportional to the length of the sequence in both the worst case and the typical case. The List implementation of sequences given in chapter 1 has this property.

The performance trade-offs embodied in the immutable array implementation are very different. In this implementation, the operations empty, head, length, eltAt, can be performed in constant time. (In Java, length is stored in a separate ``field'' in the block of storage holding the array.) In contrast, the cons operation takes time proportional to the length of the sequence, because the array in this must be copied. The running time of the rest operation depends on an interesting implementation detail. If immutable arrays are implemented as instances of a ``wrapper'' class, then the rest operation can be performed in constant time at the cost of making an extra field reference in the implementation of eltAt. A wrapper object can store an integer offset that is added to the index passed as an argument to eltAt. In this scheme, the rest operation constructs a new wrapper object containing a pointer to the same array object as this but an increased offset (by one). If immutable arrays are implemented directly by Java arrays, then rest operation must construct a completely new array one element shorter than the original.

Optional Exercise Implement a MutArray wrapper class (a) without an offset field (b) with an offset field. Measure the performance impact of including the offset pointer by timing the execution of a test program on both implementations that performs a very large number of eltAt operations.

In practice, array representations of immutable sequences do not rely on the cons and empty operations to construct new sequences. Repeatly copying arrays is very inefficient (proportional to the square of the length of the constructed array!) [Exercise: prove the O(n²) complexity.] The common practice is to provide an operation that constructs an array representation for a sequence given either a corresponding linked representation or mutable array representation.

The array implementation of immutable sequences is a good choice when new sequences are generally built from scratch rather than constructed by applying operations to existing sequences. For this reason, many computations involving immutable sequences of characters (strings) rely on the array representation. The Java String class implements the array representation for immutable sequences of characters (the primitive type char in Java). Note that Java includes an operation for converting mutable strings (represented by the Java class StringBuffer) to immutable strings (represented by the Java class String).

Next: 2.1.5 Mutable Sequences Up: 2.1 Sequences Previous: 2.1.3 Immutable Sequences

Corky Cartwright
2001-08-02