HPIOM: Heterogeneous Parallel Input/Output Machines

Based on a chapter written by Matthieu Moy

Presented By Blake Kaplan

Rice University

Where Are We?

We are the intermediate representation.

Motivation

Want to connect the output of Pinapa to a formal verification tool.
A direct connection would mix the semantics of SystemC with the tool, which is undesirable and leads to unmaintainable code.
- We can liken Pinapa's output to the AST stage in a compiler.
- It's too low-level to perform interesting optimizations on and does not generalize well.
The solution is to create an intermediate representation.

Goals

Requirements for the intermediate formalism

Well defined formal semantics
: The goal of this project is to prove theorems about SystemC, if the intermediate representation lacks formal semantics, it's failed.
Powerful enough: The intermediate representation must be able to represent the semantics of the model.
Simple to manipulate: The purpose of this representation is to reason about it.
Executable: Allows for easy verification of correctness against the SystemC engine.
Provable: The ultimate goal is to provide a formal verification of a program.

Introduction to HPIOM

The IF toolkit was too difficult to use for this purpose.
- Too difficult to add the correct backend to it.
No other toolkits were powerful enough and open source
Decision was to implement a new automata called HPIOM
- Inspired by Esterel and Argos

Trade-offs Made

There is always a trade-off between powerful languages and provability.
For HPIOM, the ultimate goal is to prove properties about the programs it describes.
So it errs towards the side of provability.
In particular, it is a finite automata formalism with no dynamic process or dynamic data structures.
- This means LusSy cannot natively represent dynamic data structures.
- Instead, those programs will have to be approximated.

`HPIOM` Basic Concepts

An automaton is a set of control points linked by edges.
Used, for example, to represent steps taken by a regular expression engine.
HPIOM edges have guards and emit signals.
No syntax for HPIOM, only exists as a data structure.
For debugging, the implementation can output a limited representation of the automaton.

Expressions and Conditions

Purely functional.
Expressions can have any type.
Conditions may only evaluate to a boolean.
Guard expressions (on edges) may only be conditions, but there exist converters to allow other types to be converted to boolean.

Types

Boolean: Can either be true or false.
Integer: Signed, unbounded precision, integral representation.
Enumerated types: A type that may take one of several statically defined values.
Arrays: Provided as a convenience over using n variables of the same type.

Automata

Definition: An automaton A is a reactive machine formally defined by a tuple (Q, V, U, q₀, Ν₀, q_f, T, M)where:

Q: a set of control points.
V: The set of variables.
U: Set of unknown values.
q₀ ∈ Q: The initial control point
Ν₀: The intial values of known variables.
q_f: The "final" control point
T: The set of labeled edges between control points.
M: The set of signals that can be sent when moving along an edge.

Example of an Automata

An automata with control points s₁, s₂, s₃ and edges t₁, t₂ and variables x, y.

Edges

Definition: An edge is a tuple (q_s, q_t, G, A, E) where:

q_s and q_t: Source and target control points
G: A condition on V and U and the precence and values of a signal
A: Parallel assignments in the form v := e
E: Signals emitted while taking this edge

Communication Between Automata

Communication via signals (both "pure" and valued).
Signals can be emitted by several edges
Sometimes it's useful to create automata whose purpose it is to simply combine signals from other automata
- One can have two edges predicated on the various input signals and emit a signal from the combination of inputs.

More Complicated Example

Example of a more complicated Automaton

States

Definition: a state s is a tuple (q, Ν) with:

q: A control point.
Ν: The valuation of all variables in the automaton

The set of initial states of A is {(q₀, Ν)} where Ν is a valuation of all variables in A with their default values.

Transitions

Formally, given an edge (q_s, q_t, G, A, E) and U describing the unknown values, the set of transitions is the set of ((q_s, Ν_s), (q_t, Ν_t), G', E', U) where:

U: A valuation of the unknown values of A.
Ν_t: The valuation after apply the substitution A to Ν_s.
G': Condition created by replacing variabled of G by their value in Ν_s.
E': Set of valued signals with the values updated for Ν_s.

Semantics of a Whole System

An HPIOM system is a set of automata (A₁, A₂, ..., A_n).
- A transition of the global system is exactly one step of each component.
The global state is a tuple (s₀, s₁, ..., s_n) containing the states of all of the automata in A.
A global transition is the set of transitions for each individual automata.
- The resulting conditions (G₀, G₁, ..., G_n) must exist.
- G_i must be satisfied given the emissions from the other automata.
- Represented by the tuple
  ((s₀, ..., s_n), (s'₀, ..., s'_n), (E₀, ..., E_n), (U₀, ..., U_n))

Validity Condition

For an HPIOM automata to be valid, it must pass the condition:

If ∃i,j|G_i is a function of a signal emitted by A_j, then G_j is not a function of A_i.

In other words, there is no instantaneous communication amongst automata.
This restriction is brought about as a result of the differing models in the back end targets.

Global Execution

The global execution trace of a system is a sequence of global transitions:
((S₀, S₁, E₁, U₁), (S₁, S₂, E₂, U₂), ..., (S_{n - 1}, S_n, E_n, U_n))

This follows directly from the definitions of global transitions and global states.

Non-determinism

Implicit: When two transitions along different edges are possible in one step.
Explicit: Using unknown_value and non_deterministic_choice

LusSy turns implicit non-determinsm into explicit non-determinism.
A construct set_choice_safe(true) exists to inform the backends that no implicit non-determinism occurs for a given edge.
- Not safe to call if there actually is implicit non-determinism.

Abstracted Types

An abstracted type can have no value.
Used in the translation from SystemC when the translator finds a type that cannot be encoded in HPIOM.
Given an expression E = operator(op₁, op₂, ..., op_n) if one of the operands of E has the unknown type, then E has the semantics of an unknown value.
For an assignment L := E if L is a variable of unknown type, then the assignment doesn't occur and L is replaced with the unknown value in all of its uses in the program.

Other Convenience Constructs

non_deterministic_choice exists to allow the writer a way to explicitly request that only one edge is taken.
in_state(s) exists to allow an edge to be taken if an automaton is in state s.
- Could be replaced by sending a signal on all incoming edges to s and testing for that signal in place of the in_state call.
continuous_signal exports a valued signal from all edges. Used to export a variable from an automaton.
An array is a sequence of T values numbered from 0.

Abstract Addresses

Have to decide how to deal with addresses.
Option 1: Could hand them off to another verification tool and treat them as being opaque structures.
But since we can distinguish the addresses with a high degree of control, we can do better than that.
Treat the address space as a series of segmented intervals, in other words, create an address map.

Abstract Addresses (cont.)

Let the relevant intervals be encoded as II = (R₁, ..., R_n).
Now, we can represent an address as a bit vector (b₁, ..., b_n) with each b_i corresponding to an R_i.
A constant address has exactly 1 b_i with a true value.
As addresses are manipulated, our representation loses information and we might get multiple b_i with true values.
- This means that the address might be in any of the ranges for which b_i is true.
- The worst case is noted ⊤ where ∀i.b_i = 1

Conservativeness

Advantages of Conservativeness

Allows the representation to be "safe" while allowing proofs.
Loss of information is still safe.
Balances expressivity against safety.

Operations on Addresses

An address literal is an array of boolean values: AA = (b_i)_{i ∈ [0, n]} = (b₁, ..., b_n).
AA + N
AA + unsigned
AA + arbitrary value =

Illustration

Comparisons

Equality:
Greater than:
Likewise for comparisons with concret values.
Note is equal to all other addresses and greater than all addresses that do not have the top bit set.

Expression of Properties

Could have used CTL or LTL as a specification language.
Since we're only interested in safety properties, however, can use the fact that our representation is an automaton.
Therefore can assert that our automata never reach a bad state.
Emit signals on bad states and assert that those signals are never emitted.
A property is satisfied if the bad pure signals are never emitted for any execution of the automata.

Other Abstractions of Addresses

The chosen abstraction has the advantage of being usable without modifying the proof engine.
But it sacrifices expressiveness.
Could have used an abstract interpretation in the prover to lose less information.
- Nbac uses a representation based on polyhedra but is expensive.
- Mathias Peron implemented an extension of the intervals domain taking into account equality and inequality.

Composite Tree Pattern

Defines an interface which all elements in the system implement.
Leaves are non-composite objects.
Containers can implement the same operations, typically by delegating to the contained nodes.
In HPIOM, the grammar rules for the tree corresponds directly with the class inheritance diagram.

Composite Tree Pattern (cont.)

For more information, see http://en.wikipedia.org/wiki/Composite_pattern .

Visitor Design Pattern

Problem: Have a bunch of objects (possibly in the Composite Tree Pattern) with extensible behaviors.
Attempted solution without the visitor problem would be to add a function on the objects themselves for each behavior:

interface Base {
    void behavior1();
    void behavior2();
    .
    .
    .
}

class Derived : Base {
    // Implement each behavior.
}

Problems and Solution

Code for each algorithm spread amongst all classes in the program.
Adding a new behavior requires recompiling every file.
External libraries cannot add new behaviors (not very extensible).
Instead, define a visitor interface that has a callback for each type of node.
Behaviors are defined as visitors of the tree.
- Note: The tree is completely ignorant of the visitors.
Can define "base" visitors that contain functionality common to a subclass of behaviors.

Optimizations

Some optimizations are done via visitors that change the tree in some way.
Others are done as the tree is built up.
- Constant folding.
- Node sharing (for singletons, such as true and false.

Future work

Improving memory management:
- Currently, no memory is freed after it is allocated.
- This is OK for this sort of tool since it is short lived.
- But it's good programming practice to free unused memory.
Namespaces, modularity:
- Currently, only Pinapa is separated from the rest of the code.
- HPIOM is mixed in with all of the back-ends.
- Would like to create namespaces to help manage dependencies and get rid of backward dependencies.
Human readable format for HPIOM:
- Maybe use XML???

The End

Up Next: Chapter 6, Bise

October 29, 2007, COMP 607