2.1.3 Immutable Sequences

Next: 2.1.4 Mutable Sequences Up: 2.1 Sequences Previous: 2.1.2 Lists

2.1.3 Immutable Sequences

An immutable sequence type Seq with elements of type T supports some subset of the following operations on a Seq object this representing s₀, s₁, ..., s_n, n>=0:

interface Seq {
  Seq empty();
  // returns Seq that is empty

  Seq cons(T newElt);
  // returns Seq with elts newElt, s[0], ..., s[n]

  Seq tail();       // yields an object representing s[1], ..., s[n]

  T head();         // yields the element s[0]

  Seq eltAt(int i); // yields the element s[i]
  int length();
  // returns the int n+1, the number of elements in the sequence
}

There are two widely used representations for immutable sequences: linked and contiguous.

In the linked representation, a sequenceis a pointer to an object, which is either an empty node, representing the empty sequence, or a cons node with a field of type T containing the first element of the sequences and a field of type Seq containing a pointer to the first node in the rest of the sequence. This data representation, which is often called a linked list, directly corresponds to the standard inductive definition of sequences. We used this representation to implement the DeptDirectory data type. The following composite class hierarchy (with method definitions omitted) defines a linked list representations for sequences of type T in Java:

abstract class List implements Seq {
  List empty() { return Empty.only; }
  List cons(T head, List tail)  { return new Cons(head, tail); }
  abstract T eltAt(int i);
  ...
}

class Empty extends List {
  static final Empty only = new Empty();
  private Empty() {}  
  // ensure that only one instance (bound to only) of
  /    Empty is created
  T eltAt(int i) { throw SequenceBoundsException(); }
}

class Cons extends List {
  final T first;
  final List rest;
  Cons(T f, List r) {
    first = f;
    rest = r;
  }
  T eltAt(int i) { 
    if (0 == i) return first;
    else return rest.eltAt(i-1);
  }
  ...
}

class SequenceBoundsException extends RuntimeException {}

The following figure shows a picture of linked list of integers.

$\includegraphics{gif/new-linked-list.ps}$

The nodes with two fields are Cons instances, and the crossed-box is an Empty instance. Pointers are represented by the heavy arrows. The pointer fields in the cells are in fact memory addresses. In Java, these addresses are always interpreted as references to objects. Java only supports operations on pointers that are consistent with this abstraction, e.g. you cannot perform arithmetic on a pointer. In lower-level languages like C and C++, pointers (addresses) can be manipulated like ordinary integers.

In the contiguous representation, a sequence is represented by a a pointer to an immutable array of fields of type T. An immutable array is an array that, once initialized, is never modified. Java doesn't directly support immutable arrays, but a Java program can enforce immutability by defining a wrapper class for arrays (akin to the Integer class for wrapping ints in objects) with a single private field holding the embedded array and a collection of public methods that do not mutate this field. A lighter weight but less robust protocol for supporting immutable arrays is to use comments to indicate which arrays are immutable and to follow the discipline of never modifying arrays documented as immutable. In either case, a new array object generally must be created whenever an element of the represented sequence is changed, added, or removed. Creating an array object is a costly operation proportional in time and space to the number of elements in the new array.

In the linked representation of sequences, every operation in the collection listed above except eltAt can be performed in constant time. On the other hand, the eltAt operation takes time proportional to the length of the sequence in both the worst case and the typical case. The List implementation of sequences given in chapter 1 has this property.

The performance trade-offs embodied in the immutable array implementation are very different. In this implementation, the operations empty, head, length, eltAt, can be performed in constant time. (In Java implementations, length is stored in a separate ``field'' in the block of storage holding the array.) With the exception of the tail operation, the remaining operations all take time proportional to the length of the sequence. The running time of the tail operation depends on an interesting implementation detail. If immutable arrays are implemented as instances of a ``wrapper'' class, then the tail operation can be performed in constant time at the cost of making an extra field reference in the implementation of eltAt. A wrapper object can store an integer offset that is added to the index passed as an argument to eltAt. In this scheme, the tail operation constructs a new wrapper object containing a pointer to the same array object as this but an increased offset (by one). If immutable arrays are implemented directly by Java arrays, then tail operation must construct a completely new array one element shorter than the original. [Exercises: implement a MutArray wrapper class (a) without an offset field (b) with an offset field. Measure the performance impact of including the offset pointer.]

In practice, array representations of immutable sequences do not rely on the cons and empty operations to construct new sequences. The repeated copying of arrays of involved is very inefficient (proportional to the square of the length of the constructed array!) [Exercise: prove the O(n²) complexity.] The common practice is to provide an operation that constructs an array representation for a sequence given either a corresponding linked representation or mutable array representation.

The array implementation of immutable sequences is a good choice when new sequences are generally built from scratch rather than constructed by applied operations to existing sequences. For this reason, many computations involving immutable sequences of characters (strings) rely on the array representation. The Java String class implements the array representation for immutable sequences of characters (the primitive type char in Java). Note that Java includes an operation for converting mutable strings (represented by the Java class StringBuffer) to immutable strings (represented by the Java class String).

Next: 2.1.4 Mutable Sequences Up: 2.1 Sequences Previous: 2.1.2 Lists

Corky Cartwright
2000-01-07