Section 10: Method Invocation in Object-Oriented Languages

 

 

 

An Object-Oriented Language Interpreter

In object-oriented languages, all values are objects. An object is a tuple (like a C struct, a Scheme list, a Pascal record, or an O'Caml tuple) that can include code as well as data. Objects only interact by sending messages between one another. Every object is an instance of some class, which defines what messages the object will accept and act upon. The object being sent the message is called the receiver.

In this lecture, we'll develop an interpreter for Featherweight Java (FJ). FJ is a greatly simplified version of Java with very clean semantics, while still preserving all of the object-oriented-ness of Java.

FJ objects accepts two sorts of messages. Method invocations instruct an object to perform a computation. Field selects request some stored values. The class of an object defines what fields can be selected and what methods can be invoked. Fields and methods are collectively called members.

A key feature of object-oriented languages is inheritance. Every class C has a designated superclass S that it extends. A class accepts all messages that its superclass accepts. In FJ, this means that every member of class S is automatically a member of class C (we say C inherits from S). Frequently, there is a special built-in class (called Object in Java) that is the top-level class in the class hierarchy (every class inherits from Object).

Implementing inheritance requires dynamic dispatch. When a class C redefines a method m inherited from its superclass S, the inherited definition of method m is replaced with the definition provided in C. This process is called method overriding and has profound consequences for the behavior of method invocations on an object of class C. If other (non-overridden) methods inherited by C from S invoke the overridden method m they execute the method definition for m in C rather than the definition for m in S because m is retrieved from the receiver object which is an instance of class C, not class S.

In addition to dynamic dispatch, method invocation is also notable in that a method takes the object in which it resides as a hidden argument. Within the body of the method, a special keyword (this in FJ) is used to refer to this hidden argument. The invocation of object o's method m (commonly written o.m(a1,...,an), where o is some expression that evaluates to an object) should proceed as follows:

  1. Evaluate the object expression to an object
  2. Get the classname from that object
  3. Lookup the class in the class table and find the method in that class
  4. Evaluate the arguments passed to the method
  5. Extend the environment by this and the arguments
  6. Evaluate the contained expression

We begin the interpreter by presenting some helper definitions. A map is an ('a * 'b) list, where 'a maps to 'b for each element in the list

type ('a,'b) map = ('a * 'b) list

let empty = []

Lookup symbol k' in map x (maps are described above). This is used for:

The polymorphism in this type (the 'a) is what allows us to use lookup for these three different tasks.

(* symbol -> (symbol * 'a) list -> 'a *)

let rec lookup k' x =
  match x with 
    ((k,v)::l) -> if k=k' then Some v else lookup k' l
  | _          -> None

This is a handy function that takes two lists and makes them into a list of pairs. It's like a "point-wise merge" or glueing together two tables.

(* ('a list * 'b list) -> ('a * 'b) list *)

let rec zip (a,b) =
  match (a,b) with
    (x::xs, y::ys) -> (x,y) :: zip (xs,ys)
  | _              -> []

The AST for Featherweight Java

Now we define the AST's for the different syntactic categories in our language. FJ will have variables, class names, method names, and field names. For clarity, we'll just introduce these different type synonyms for the names of each of these things, even though they are all just strings

type var       = string  (* including "this" *)
type className = string
type methName  = string
type fieldName = string

The main type for our AST will be the one for expressions, which will include variables, invocations, (method) selection, and new object allocation.

type exp = 
    Var of var 
  | Inv of exp * methName * exp list 
  | Sel of exp * fieldName
  | New of className * exp list

Finally, we need types for the components of a class definition. A class is defined by its superclass, list of field names, and a list of method definitions.

type methDef    = var list * exp
type classDecl  = className * fieldName list * (methName, methDef) map
type classTable = (className, classDecl) map
type program    = classTable * exp

Now we turn our attention to modeling values in our language. The only type of value in FJ is an object. An object is defined by its class and the values of its data fields; this includes the values for all superclass fields as well. The object's methods are shared among all instances of the class, and are most commonly stored with the class.

type value = Obj of className * value list

Some Example Classes

Now that we've defined an AST, let's look at some example classes:

We use myObject as the root of our class hierarchy.

(* class myObject { } *)
let myObject = ("",[],empty)

Class thunk simply contains a stored value.

(* class thunk ext myObject { myObject val; myObject get () { this.val } } *)
let thunk = ("myObject", ["val"], [("get", ([], Sel (Var "this", "val")))])

A boolean class. Since it is neither true nor false, the if method diverges.

(* class bool ext myObject { myObject if (thunk x, thunk y) { this.if(x,y) } } *)
let bool = ("myObject", [], [("if", (["x";"y"], Inv (Var "this", "if", [Var "x";Var "y"])))])

Define true such that true.if returns its first argument.

(* class true ext bool { myObject if (thunk x, thunk y) { x.get() } } *)
let myTrue = ("bool", [], [("if", (["x";"y"], Inv (Var "x", "get", [])))])

And define false such that false.if returns its second argument.

(* class myFalse ext bool { myObject if (thunk x, thunk y) { y.get() } } *)
let myFalse = ("bool", [], [("if", (["x";"y"], Inv (Var "y", "get", [])))])

Interpreting Featherweight Java

To enable dynamic dispatch, method lookup must search not just the specified class, but all superclasses up to the top level. The method lookup function looks up the method in the class's method list, and if the method is not found, the lookup recurs on the superclass.

exception EvalError of string

(* class table -> method name -> class name -> methodDef *)
let rec lookupMeth table methodname classname =
  match lookup classname table with
      None -> raise(EvalError("Error: class not found: "^classname))
    | Some (super,_,meths) -> 
	match lookup methodname meths with
            Some x -> x
	  | None   -> lookupMeth table methodname super

Field lookup similarly must be modified to search all superclasses as well as the specified class. The field lookup function looks up the field in the class's field list. If it finds the field, it returns the corresponding value from the object's value list. Otherwise, the lookup recurs on the superclass.

(* class table -> field name -> (class name * field value list) -> value *)

let rec lookupField table fieldname (classname,fieldvaluelist) =
  match lookup classname table with 
      None -> raise(EvalError("Error: class not found: "^classname))
    | Some (super,fields,_) ->
 	let rec search (a,b) = 
          match (a,b) with
              (name::namelist, value::valuelist) ->
		if fieldname=name then value else search (namelist,valuelist)
            | ([],valuelist) ->
		lookupField table fieldname (super,valuelist)
	    | (_,[]) ->
		raise(EvalError("Error: Some fields not defined."))
	in search (fields,fieldvaluelist)

Now we're ready to write the main interpreter function for FJ. Evaluation of Java proceeds exactly the same as evaluation of Jam or Lambda Calculus. We match on the expression:

(* interp: class table -> env -> exp -> result *)
let rec interp table = 
  let interpT env =
    let rec interpTE =
      fun a -> match a with 
          Var x -> let v' = lookup x env in
	    (match v' with
		 Some v -> v
	       | None   ->
		   raise(EvalError("Error: variable not found: "^x)))
	| Inv (objexp,methodname,arglist) -> 
	    let v = (interpTE objexp) in
	    let classname = match v with
		(Obj (c,_)) -> c in
	    let z = lookupMeth table methodname classname
	    and argvallist = (List.map interpTE arglist) in
            let (varlist,evalmeth) = z in
	      interp table (("this",v)::(zip (varlist,argvallist))) evalmeth
	| Sel (objexp,fieldname)    -> 
	    let Obj (classname,fieldvaluelist) = (interpTE objexp)
	    in (lookupField table fieldname (classname,fieldvaluelist))
	| New (classname,fieldvalueslist)   ->
 	    Obj (classname, (List.map interpTE fieldvalueslist))
    in interpTE
  in interpT

A Data Encoding for Featherweight Java

Recall how Church numerals used nested function calls to represent the natural numbers in Lambda Calculus. We similarly represent numbers in FJ as essentially a linked list:

Class zero is the base case of this representation, and class succ defines all succeeding numbers in terms of their predecessor. So by counting the number of steps before we reach zero, we can easily determine the value of a number. We define addition, multiplication, and exponentiation as iterated applications of successor, addition, and multiplication, respectively.

We begin with an interface for computation. Since FJ is not statically typed, we don't need to include this interface to correctly interpret the code. It is included because in standard Java, it would be required.

(* interface numCont ext myObject { num app (num arg) } *)

We continue with an abstract class for number. Numbers contain a predicate for equivalence to zero, an interation method, and methods to compute addition, multiplication, and exponentiation.

(* class num ext myObject { ; myObject ifz (thunk z, numCont s) { z.get() } myObject iter (thunk z, numCont s) { z.get() } num add (num x) { this.iter (new thunk (x), new addC ()) } num mul (num x) { this.iter (new thunk (new zero ()), new mulC (x)) } num exp (num x) { x.iter (new thunk (new succ (new zero ())), new expC (this)) } } *)
let num = ("myObject", [], [ ("ifz", (["z";"s"], Inv (Var "z", "get", []))); ("iter", (["z";"s"], Inv (Var "z", "get", []))); ("add", (["x"], Inv (Var "this", "iter", [New ("thunk", [Var "x"]); New ("addC", [])]))); ("mul", (["x"], Inv (Var "this", "iter", [New ("thunk", [New ("zero", [])]); New ("mulC", [Var "x"])]))); ("exp", (["x"], Inv (Var "x", "iter", [New ("thunk", [New ("succ", [New ("zero", [])])]); New ("expC", [Var "this"])])))])

We now define classes used by class num to compute addition, multiplication, and exponentiation. Note that these classes all implement the numCont interface specified above. However, since FJ is not statically typed, we don't need interfaces.

(* class addC ext myObject imp numCont { num app (num x) { new succ (x) } } *)
let addC = ("myObject", [], [("app", (["x"], New ("succ", [Var "x"])))])
(* class mulC ext myObject imp numCont { num cand; num app (num x) { this.cand.add (x) } } *)
let mulC = ("myObject", ["cand"], [("app", (["x"], Inv (Sel (Var "this", "cand"), "add", [Var "x"])))])
(* class expC ext myObject imp numCont { num base; num app (num x) { this.base.mul (x) } } *)
let expC = ("myObject", ["base"], [("app", (["x"], Inv (Sel (Var "this", "base"), "mul", [Var "x"])))])

Finally, we define zero, and succ(n). This is all that is needed to represent all natural numbers.

(* class zero ext num {} *)
let zero = ("num", [], empty)
(* class succ ext num { num pred; myObject ifz (thunk z, numCont s) { s.app (this.pred) } myObject iter (thunk z, numCont s) { s.app (this.pred.iter (z,s)) } } *)
let succ = ("num", ["pred"], [ ("ifz", (["z";"s"], Inv (Var "s", "app", [Sel (Var "this", "pred")]))); ("iter", (["z";"s"], Inv (Var "s", "app", [Inv (Sel (Var "this", "pred"), "iter", [Var "z"; Var "s"])])))])

These functions convert O'Caml's internal representation of numbers to the FJ representation and back:

(* build: int -> FJ number *)

let rec build i =
  if i = 0 then (New ("zero",[]))
  else (New ("succ", [build (i-1)]))

(* count: FJ number -> int *)

let rec count o =
  match o with
      (Obj ("zero",[]))   -> 0
    | (Obj ("succ", [v])) -> count v + 1
    | _                   -> raise(EvalError("Error: Invalid number."))

All of these code samples depend on (i.e. must be interpreted using) the following class table, because superclasses are looked up by string names, not by reference.

let topenv = [("myObject",myObject); ("thunk",thunk); ("bool",bool);
  ("myTrue",myTrue); ("myFalse",myFalse); ("num",num); ("zero",zero);
  ("succ",succ); ("addC",addC); ("mulC",mulC); ("expC",expC)]

Closures in Featherweight Java

In previous lectures, we've discussed implementing imperative programming features (such as gotos) in Lambda Calculus. In this lecture, we implemented an interpreter for FJ, an object-oriented language, in O'Caml. Now, we flip perspectives. We want to use FJ, whose semantics we just specified, to implement a feature common in functional languages: closures.

In LC, closures close over the environment in which they are created. That is, the expression contained within the closure is evaluated within this environment. In Java, you create a closure by defining an anonymous inner class within a method. Any method of the anonymous inner class is evaluated in the environment of the method (with local variables appropriately defined). An example is as follows:

public class c {
  Object g() {
    Object y;
    Object x = new Object() {
      Object f() {
        ...
      } 
    }
    return x;
  }
}

At this point, when object x's method f() is invoked, even after method g() returns, g()'s local variables (including y and this) can still be accessed.

We can do this in FJ as well (and this is how Java actually implements closures). We begin by defining a new class called c'. c' will contain as fields each element of the environment (in the above case, x, y, and this (renamed this0 to avoid ambiguity). Now any references to x, y, or this in f() are renamed to field selects on this.x, this.y, and this.this0, respectively. The effect is that instances of class c' are closed over the environment in which they were statically declared.

public class c' {
  Object x;
  Object y;
  Object this0

  Object f() {
    ...
  } 
}

Suppose we want to implement λ x.λ y.x in Java.

public class c {
  void f(Object x) {
    return new Object() {
      Object g(Object y) {
        return x;
      }
    }
  }
}

Class c's method f() is the λ x. The anonymous inner class's method g() is the λ y. The x returned by g() is the same x that was passed in to f. In FJ, We implement this again by explicitly defining the anonymous inner class. Java actually converts closures to this form to interpret them.

public class c {
  void f(Object x) {
    return new c'(this,x);
  }
}

public class c'{
  Object x;
  Object this0;

  void g(Object y) {
    return x;
  } 
}

Side Note: In Java, any variables declared within a function and used by a closure within the function must be declared final, which means the variable cannot be assigned. (Objects assigned to variables declared final can be mutated, but the variable cannot later be assigned to directly.).
To understand why, look at the following example, and consider which y should f() return, the one that existed when the closure was created, or the one in the environment as it stood at the end of the method. There is a great potential for bugs as a result of either choice and the associated confusion; to avoid this confusion, Java requires that any such variables not be reassignable.
public class c {
  Object g() {
    Object y = null;
    Object x = new Object() {
      Object f() {
        return y;
      }
    }

    y = new Object();
    return x;
  }
}

Back to course website