The goal on this assignment is to explore the use of OpenMP on our Sun Niagara T2000 system (yellowstone.rcsg.rice.edu --- log in using your ADA userid). There are two parts to the assignment.
Part 1: Micro-benchmarking
Download the OpenMP C microbenchmarks from the EPCC OpenMP Microbenchmarks page. Use these benchmarks to evaluate the performance of OpenMP on the T2000 for 8, 16, 32, and 64 threads. Submit a report with your results and a brief commentary of which OpenMP constructs incur less or more overhead than you might have expected.
Part 2: Gaussian Elimination in OpenMP
Gaussian elimination is a classical method for solving a matrix of linear equations of the form Ax = b. By performing elementary row operations, Gaussian elimination transforms the square matrix A into an equivalent upper-triangular matrix. Following transformation of A into upper triangular form, a back substitution phase solves for x. A high level overview of Gaussian elimination and back substitution can be found on Mathworld.
In this assignment, you will develop a parallel linear solver in OpenMP
that uses Gaussian elimination
with partial pivoting (partial pivoting is used to enhance numerical stability) to transform
a dense n x n matrix into an upper-triangular one
and then solve the resulting linear system by back substitution.
Below is pseudocode written using Fortran 90
array notation for a sequential implementation of Gaussian elimination
with partial pivoting.
inputs: a(n,n), b(n) outputs: a(n,n), b(n) in echelon form do j=1,n-1 ksave = maxloc(abs(a(j:n,j))) k = ksave(1) + j-1 swap a(j,:) and a(k,:) swap b(j) and b(k) do k=j+1, n m = a(k,j)/a(j,j) a(k,:) = a(k,:) - m*a(j,:) b(k) = b(k) - m*b(j) enddo enddoYour parallel solver implementation should accept two command-line arguments:
The back substitution and verification steps need not be parallelized. Have your program separately time the Gaussian elimination phase by reading the real-time clock before and afterward and printing the difference.
The formal components of Part 2 of the assignment are listed below:
Use problem size n = 3000 to evaluate the performance of your implementations. Prepare a table that includes your timing measurements for the elimination phase of your implementations on 1, 8, 16, 32, and 64 threads on our T2000 system, yellowstone. Graph the parallel efficiency of your program executions (see below for definition). Plot a point for each of the executions. The x axis should show the number of processors. The Y axis should show your measured parallel efficiency for the execution.