Vectorization of Apply to Reduce Interpretation Overhead of R
R, a dynamic scripting language designed for statistical computing, has grown in popularity in recent years. But the low performance of R, due to inefficiencies in its interpretation, limits its usability. The conventional approach to improve its performance is to either build a new more efficient interpreter or apply Just-In-Time (JIT) compilation techniques. However, either approach requires significant changes to the interpreter, thus imposes a high barrier for adoption by development teams.
This paper presents an approach that reduces the interpretation overhead of R through the vectorization of the widely used Apply class of operations in R. The normal implementation of Apply incurs in a large interpretation overhead resulting from iteratively applying the input function to each element of the input data. Our approach combines data transformation and function vectorization to convert the looping-over-data execution into vector operations. This conversion can significantly speedup the execution of Apply operations in R.
We implemented the vector transformation as an R package that can be invoked by the standard GNU R interpreter. It does not require any modification to the R interpreter itself. The evaluation shows that the transformed code can achieve on average 15x speedup for iterative method and 5x for direct method of a suite of data analysis algorithm benchmarks without any native code generation and still using only a single-thread of execution.