Principles of Programming Languages: Lecture 1

Lecture 1, Comp 311

Welcome to this exploration of the landscape of programming languages.

There are several major theses that we can formulate about the role of programming languages in computation. They include:

Thesis 1: You must learn to speak the programming languages that control the technologies of interest (or economic importance) to you.

Over the past four decades, thousands of programming languages have been designed and most have been forgotten, but programming language design is by no means a dead area. In only the past few years, a number of new languages have become prominent (in some cases dominant) in specific computational applications including:

AMPL
HTML
Java
Perl
Active VRML

Each of these new language is targeted at a particular technological niche. AMPL is designed for numerically solving systems of partial differential equations; HTML is a "mark-up" language for expressing hypertext documents; Java is a language designed for expressing small applications (applets) transmitted over the internet; Perl is a language designed for transforming text files; and Active VRML, descended from the CAML family of languages, is a language designed by Microsoft for transmitting active virtual reality simulations across networks. One of theses languages, Java, is poised to capture a far bigger niche, namely general applications programming, than its initial target. Why? What properties make it so attractive?

Thesis 2: Programming languages are invented while you sleep, and spread before you wake up.

This proliferation of languages has made it especially important to understand the design and properties of languages. In particular, many technologies already have languages designed to address them; the user only needs to find the appropriate language.

Thesis 3: Understanding programming languages is the key to most computing enterprises.

This is a much more controversial statement, and hence bears elaboration. Why do we need to understand programming languages?

Technical jobs in computer science will inevitably involve programming, which requires us to understand the languages we use.
We might be called upon to choose a programming language for some project. Making such a selection involves several issues, including technological, sociological (such as training programmers) and economic (such as re-use of existing code and programming environments) considerations. Of these, we will focus mostly on technological considerations.
Finally, we might be asked to design a new language as part of an applications program. To be successful, we must understand past efforts, current needs and key technological ideas.

At the heart of these issues is a fundamental question: What does it mean to understand a programming language? Let us illustrate the problem via an example. Consider the following statement:

set x[i] to x[i] + 1

This is clearly intended to denote the increment of an array element. How would we translate this statement to a variety of different languages, and what would it mean?

In C (circa 1970), we would write this as

x[i] = x[i] + 1;

This performs a hardware lookup for the address of x and adds i to it. The addition is a hardware operation, so it is dependent upon the hardware in question. This resulting address is then referenced (if it's legal - which it might not be), 1 is added to the bit-string stored there (again, as a hardware addition, which can overflow), and the result is stored back to that location. However, no attempt has been made to determine that x is even a vector and that x[i] is a number.

In Scheme (1975), this would be transcribed as

(vector-set! x i (+ (vector-ref x i) 1))

This does all the things the corresponding C operation does, but in addition it also (a) checks that the object named x is indeed an array, (b) makes sure i is within the bounds of the array, (c) ensures the dereferenced location contains a number, and (d) performs abstract arithmetic (so there will be no ``overflow'').

Finally, in Java (circa 1991), one might write

x[i] = x[i] + 1;

which looks identical to the C code. However, the actions performed are those performed by the Scheme code, with one major difference: the arithmetic is 32-bit 2's complement with no overflow, which is less "abstract" than Scheme. Scheme peforms conventional unbounded integer arithmetic. Java arithmetic has a precise mathematical description, but it corresponds to the behavior of the basic arithmetic operations on a typical 32-bit computer (when overflow exceptions are suppressed). Hence, in Java, an integer cannot grow arbitrarily large.

Thus, we have a table as follows:

Operation		Abstraction Level
	Hardware	...	Abstract
Vector lookup	C/C++		Java, Scheme, SML
Arithmetic	C/C++	Java, SML	Scheme

Note that the SML definition leaves the details of arithmetic unspecified, so some implementations provide abstract arithmetic while others offer machine-based numbers (or, possibly, both).

What do we need to know to program in a language? There are three crucial components to any language. The syntax of the language is a way of specifying what is legal in the phrase structure of the language; knowing the syntax is analogous to knowing how to spell and form sentences in a natural language like English. However, this doesn't tell us anything about what the sentences mean.

The second component is the meaning, or semantics, of a program in that language. Ultimately, without a semantics, a programming language is just a collection of meaningless phrases; hence, the semantics is the crucial part of a language.

Finally, as with natural languages, every programming language has certain idioms that a programmer needs to know to use the language effectively. This is sometimes referred to as the pragmatics of the language. Idioms are usually acquired through practice and experience, though research over the past few decades has lead to a better understanding of these issues.

Unfortunately, since the syntax is what a programmer first comes into contact with, and continues to overtly deal the most with, there is a tendency to over-emphasize the syntactic aspects of a language. Indeed, a speaker at a conference held in Houston in 1968 declared that the field of programming languages was dead, since we had understood everything about languages; the speaker was (largely) correct in referring to the syntactic problems that we must solve, but was failing entirely to consider the semantic issues involved.

There are several ways in which we can approach the study of languages. For instance, we could learn a little each of several languages that differ in some important aspect or another. There are several shortcomings in such an approach: it is hard to make direct comparisons, since by changing languages we may be changing several parameters; also, one would have to become comfortable with several different syntaxes and environments in very short order. To avoid these difficulties, we prefer to start with a single language that we define, which can then be enhanced in tightly-controlled ways as desired.

Having decided what to study, we must concern ourselves with how we will specify semantics. A natural language like English is a candidate for expressing semantics. However, natural languages are inherently imprecise; they are also unwiedly for expressing intricate details. (Witness, for instance, the descriptions of array incrementing above.) We can be precise in mathematical notation or, just as well, in an existing programming language; the latter offers the advantage of being an executable specification. Therefore, we choose to write programs which evaluate representations of programs in our defined language. Such programs are called interpreters. We prefer to write our interpreters in Scheme because the language makes it easy to prototype new languages that look syntactically similar to it.

Thus, we can characterize this course in contrast to synonymous offerings at most other institutions by the following matrix:

	Study by Breadth		Study in Depth
Natural languages	Other courses
Definitional interpreters			This course

The next question of import is, How does one choose a programming language? More specifically, one might ask, why would one choose C for a task? There are some possibilities: it might offer some advantages in real-time systems, or it can often run with a small memory footprint. These criteria are especially valuable for systems that run in constrained environments and control critical machinery (be it a car or a missile).

Why would one use Scheme or Java? In addition to various language features that they offer, these languages also have the advantage of locating bugs sooner than the corresponding C programs. This is because each operation checks for errors, so an error is caught close to its physical location (though, of course, this may still be quite removed from the location of the logical error). Hence, it is impossible that the program will proceed blissfully with the wrong values, or terminate without having signalled an error at all. Hence, ceteris paribus, there is a clear likelihood of finding more errors, and sooner, in such languages. Detecting errors early is important for keeping the cost of development down. This shows how technology can have a direct impact on economics.

There is one final question that we must consider: How do we evaluate programming languages? We evaluate them by studying the details of their semantics to understand the universal properties and considering how each language treats these properties. For instance, two properties that are the subject of much current investigation are the type structure and memory management of programs. These properties give us coordinates along which languages may be classified.