Vectorization of Apply to Reduce Interpretation Overhead of R (SPLASH 2015 - OOPSLA Artifacts)

Fri 23 - Fri 30 October 2015 Pittsburgh, Pennsylvania, United States

Who

Haichuan Wang, David Padua, Peng Wu

Track

SPLASH 2015 OOPSLA Artifacts

Abstract

R, a dynamic scripting language designed for statistical computing, has grown in popularity in recent years. But the low performance of R, due to inefficiencies in its interpretation, limits its usability. The conventional approach to improve its performance is to either build a new more efficient interpreter or apply Just-In-Time (JIT) compilation techniques. However, either approach requires significant changes to the interpreter, thus imposes a high barrier for adoption by development teams.

This paper presents an approach that reduces the interpretation overhead of R through the vectorization of the widely used Apply class of operations in R. The normal implementation of Apply incurs in a large interpretation overhead resulting from iteratively applying the input function to each element of the input data. Our approach combines data transformation and function vectorization to convert the looping-over-data execution into vector operations. This conversion can significantly speedup the execution of Apply operations in R.

We implemented the vector transformation as an R package that can be invoked by the standard GNU R interpreter. It does not require any modification to the R interpreter itself. The evaluation shows that the transformed code can achieve on average 15x speedup for iterative method and 5x for direct method of a suite of data analysis algorithm benchmarks without any native code generation and still using only a single-thread of execution.

Vectorization of Apply to Reduce Interpretation Overhead of R

Haichuan Wang

University of Illinois at Urbana-Champaign

China

David Padua

University of Illinois at Urbana-Champaign

United States

Peng Wu

Huawei America Lab

China

Tracks

Co-hosted Conferences

Workshops

Co-hosted Symposia