COMP 422 Assignment 2

COMP 422

Parallel Computing

Spring 2008

Assignment 2

Complete by 5pm Friday, February 29, 2008

Exploring OpenMP

The goal on this assignment is to explore the use of OpenMP on our Sun Niagara T2000 system (yellowstone.rcsg.rice.edu --- log in using your ADA userid). There are two parts to the assignment.

Part 1: Micro-benchmarking

Download the OpenMP C microbenchmarks from the EPCC OpenMP Microbenchmarks page. Use these benchmarks to evaluate the performance of OpenMP on the T2000 for 8, 16, 32, and 64 threads. Submit a report with your results and a brief commentary of which OpenMP constructs incur less or more overhead than you might have expected.

Part 2: Gaussian Elimination in OpenMP

Gaussian elimination is a classical method for solving a matrix of linear equations of the form Ax = b. By performing elementary row operations, Gaussian elimination transforms the square matrix A into an equivalent upper-triangular matrix. Following transformation of A into upper triangular form, a back substitution phase solves for x. A high level overview of Gaussian elimination and back substitution can be found on Mathworld.

In this assignment, you will develop a parallel linear solver in OpenMP that uses Gaussian elimination with partial pivoting (partial pivoting is used to enhance numerical stability) to transform a dense n x n matrix into an upper-triangular one and then solve the resulting linear system by back substitution. Below is pseudocode written using Fortran 90 array notation for a sequential implementation of Gaussian elimination with partial pivoting.

    inputs: a(n,n), b(n)
    outputs: a(n,n), b(n) in echelon form

    do j=1,n-1
      ksave = maxloc(abs(a(j:n,j)))
      k = ksave(1) + j-1
      swap a(j,:) and a(k,:)
      swap b(j) and b(k)
      do k=j+1, n
        m = a(k,j)/a(j,j)
        a(k,:) = a(k,:) - m*a(j,:)
        b(k) = b(k) - m*b(j)
      enddo
    enddo

Your parallel solver implementation should accept two command-line arguments:

n - the dimension size for the matrix
t - the number of threads

Your program will allocate an n x n matrix, fill it with random floating point numbers ranging from 1 to 1000, apply Gaussian elimination with partial pivoting to transform the matrix into an upper triangular one, and then apply back substitution to compute x. To check your answer, compute the square root of the sum of the squares of the residual vector (this sum is known as the L2-norm) computed as Ax-b. Print the value of the L2-norm of the residual. (It should be very small.)

The back substitution and verification steps need not be parallelized. Have your program separately time the Gaussian elimination phase by reading the real-time clock before and afterward and printing the difference.

The formal components of Part 2 of the assignment are listed below:

Write a shared-memory parallel program that uses OpenMP to to solve a linear system using Gaussian elimination with partial pivoting.

Write a document that describes how your program works. This document should not include your programs, though it may include figures containing pseudo-code that sketch the key elements of your parallelization strategy for each implementation. Discuss the task decompositions used, and justify your implementation choices. Explain how the parallel work is synchronized.
Use problem size n = 3000 to evaluate the performance of your implementations. Prepare a table that includes your timing measurements for the elimination phase of your implementations on 1, 8, 16, 32, and 64 threads on our T2000 system, yellowstone. Graph the parallel efficiency of your program executions (see below for definition). Plot a point for each of the executions. The x axis should show the number of processors. The Y axis should show your measured parallel efficiency for the execution.

Submitting your Assignment

Your submission should include the following:

A report for Part 1 in either Word or pdf format.
A tar file containing your source code and a Makefile for Part 2.
A report for Part 2 in either Word or pdf format.

Email the instructor your programs and writeup (in either Word or pdf format) by 5pm February 23.

Grading Criteria

25% Writeup for Part 1.
25% Program correctness for Part 2.
25% Program scalability and performance for Part 2.
25% Writeup for Part 2.

Your grade on the writeups includes the quality of the writing (grammar, sentence structure), whether all of the required elements are included, and the clarity of your explanations,

Parallel efficiency is computed as S/(p * T(p)), where S represents the wall clock time of a sequential execution of your program, p is a number of processors and T(p) is the wall clock time of an execution on p processors. Don't make the mistake of using the execution time of a one thread version of your parallel implementation as the time of a sequential execution. The execution time of the one-thread parallel implementation is typically larger than that of the original sequential implementation. When you compute parallel efficiency, always use the performance of the original sequential code as a baseline; otherwise, you will overestimate the value of your parallelization.