The initial wave of GPU exploitation started with native code written in CUDA and OpenCL by developers familiar with the internals of high-performance computer systems.  Looking to the future, it will be very important to make the performance and energy efficiency benefits of GPUs and accelerators available to mainstream developers who write code in managed languages such as Java.  The OpenCL standard provides a good starting point in this regard since it enables portable execution of SIMT kernels across a wide range of platforms, including multi-core CPUs and many-core GPUs.  However, using OpenCL from Java to program GPUs is difficult and error-prone because programmers are required to manually (1) manage data transfers between the host system and the GPU, (2) write kernels in the OpenCL language, and (3) use the Java Native Interface (JNI) to access the C/C++ APIs for OpenCL.

In this paper, we propose automatic generation of OpenCL to accelerate programs written in a managed language like Java.  Our approach includes automatic generation of OpenCL code and JNI glue code from a Java-level parallel loop construct ("forall"), while leveraging an "array-view" construct for rectangular multidimensional arrays and "next" construct for all-to-all barrier synchronizations. Unlike some past approaches for generating CUDA or OpenCL from high-level languages, our approach preserves Java exception semantics by generating dual-version code consisting of exception-safe regions and unsafe regions from "safe" construct.
On an AMD processor, our results show speedups of up to 6.2$\times$ relative to sequential Java on the host 4-core CPU, and of up to 20.4$\times$ when executed on the many-core APU.  Likewise, for an Intel Xeon CPU with an NVIDIA Fermi GPU, the speedups relative to sequential Java were 26.9$\times$ for the 12-core CPU and 62.1$\times$ for the GPU.  These results show the potential for significant impact due to automatic generation of OpenCL to accelerate programs written in Java and other managed languages.

