A Highly Available Parallel Environment for PVM/MPI Du Shu, Wang Dongsheng Department of Computer Science, Tsinghua University, Beijing 100084, P.R.China Recently, the Cluster of Computers (COC) has been used to run large parallel programs increasingly. Task migration is a desirable and useful facility to implement Load-Balance and High Availability in COCs. This report presents a quick migration protocol for PVM/MPI tasks, which allows non-migrating tasks to execute during most of the time of migration. Message buffering and process table updating are key mechanisms of this protocol. Because MPI implementation does not make provisions for tasks migration, this report also describes the work we have done to modify an MPICH P4 implementation to allow task migration. Processes management mechanism and deadlock avoidance technique are the key issues we discuss. At last we introduce the high availability characters of our system that is completed grounded on these ideas and techniques. This system can not only recover from the process-error and node-failure, but also implements a simple load-balancing algorithm to automatically take advantage of the new available computing nodes. Keywords: Cluster of Computers (COCs), High Availability, MPI, Process Migration, Checkpointing