Rice University
COMP 422
Parallel Computing
Spring 2008
Assignment 2
Complete by 5pm Friday, February 29, 2008

Exploring OpenMP

The goal on this assignment is to explore the use of OpenMP on our Sun Niagara T2000 system (yellowstone.rcsg.rice.edu --- log in using your ADA userid). There are two parts to the assignment.

Part 1: Micro-benchmarking

Download the OpenMP C microbenchmarks from the EPCC OpenMP Microbenchmarks page. Use these benchmarks to evaluate the performance of OpenMP on the T2000 for 8, 16, 32, and 64 threads. Submit a report with your results and a brief commentary of which OpenMP constructs incur less or more overhead than you might have expected.

Part 2: Gaussian Elimination in OpenMP

Gaussian elimination is a classical method for solving a matrix of linear equations of the form Ax = b. By performing elementary row operations, Gaussian elimination transforms the square matrix A into an equivalent upper-triangular matrix. Following transformation of A into upper triangular form, a back substitution phase solves for x. A high level overview of Gaussian elimination and back substitution can be found on Mathworld.

In this assignment, you will develop a parallel linear solver in OpenMP that uses Gaussian elimination with partial pivoting (partial pivoting is used to enhance numerical stability) to transform a dense n x n matrix into an upper-triangular one and then solve the resulting linear system by back substitution. Below is pseudocode written using Fortran 90 array notation for a sequential implementation of Gaussian elimination with partial pivoting.

    inputs: a(n,n), b(n)
    outputs: a(n,n), b(n) in echelon form

    do j=1,n-1
      ksave = maxloc(abs(a(j:n,j)))
      k = ksave(1) + j-1
      swap a(j,:) and a(k,:)
      swap b(j) and b(k)
      do k=j+1, n
        m = a(k,j)/a(j,j)
        a(k,:) = a(k,:) - m*a(j,:)
        b(k) = b(k) - m*b(j)
      enddo
    enddo
Your parallel solver implementation should accept two command-line arguments:
  1. n - the dimension size for the matrix
  2. t - the number of threads
Your program will allocate an n x n matrix, fill it with random floating point numbers ranging from 1 to 1000, apply Gaussian elimination with partial pivoting to transform the matrix into an upper triangular one, and then apply back substitution to compute x. To check your answer, compute the square root of the sum of the squares of the residual vector (this sum is known as the L2-norm) computed as Ax-b. Print the value of the L2-norm of the residual. (It should be very small.)

The back substitution and verification steps need not be parallelized. Have your program separately time the Gaussian elimination phase by reading the real-time clock before and afterward and printing the difference.

The formal components of Part 2 of the assignment are listed below:

Submitting your Assignment

Your submission should include the following: Email the instructor your programs and writeup (in either Word or pdf format) by 5pm February 23.

Grading Criteria

  1. 25% Writeup for Part 1.
  2. 25% Program correctness for Part 2.
  3. 25% Program scalability and performance for Part 2.
  4. 25% Writeup for Part 2.
Your grade on the writeups includes the quality of the writing (grammar, sentence structure), whether all of the required elements are included, and the clarity of your explanations,
Parallel efficiency is computed as S/(p * T(p)), where S represents the wall clock time of a sequential execution of your program, p is a number of processors and T(p) is the wall clock time of an execution on p processors. Don't make the mistake of using the execution time of a one thread version of your parallel implementation as the time of a sequential execution. The execution time of the one-thread parallel implementation is typically larger than that of the original sequential implementation. When you compute parallel efficiency, always use the performance of the original sequential code as a baseline; otherwise, you will overestimate the value of your parallelization.