Research directions are typically driven by insights from real-world data. Programming language research is driven by the perceived needs of users, perceived problems with existing languages/designs, or intuition from using the language for many years. There have been many previous studies on programming language feature usage (this is only a tiny sample). Such studies typically look at existing bodies of software and mine the use of a programming language feature, apply some metrics to the mined data, and come to an appropriate conclusion. Typically this is done on a very small number of projects, as performing such a study on a large number of projects presents many challenges, such as: where to find such a large number of software projects, how to download and manage such a large amount of data, how to represent that data to ease storage and query requirements, how to efficiently query the data, etc. Despite these challenges, researchers would like to analyze a larger body of software in order to help generalize their results.
This tutorial will present Boa, a language and infrastructure that solves these challenges and allows researchers to easily query a very large body of open-source software. Boa provides a declarative, domain-specific language to allow easily expressing queries about software. These queries are submitted to Boa’s website, which transforms them into distributed (Hadoop) programs that run on our cluster. Thus, while our dataset contains hundreds of thousands of open-source projects, the sequential-looking queries execute in minutes. Programming language researchers do not need to be bothered with finding and gathering data, figuring out how to process it, storing it, or processing it in a timely manner as all of that is handled by Boa. This opens up the possibilities for researchers to finally query large amounts of real-world data.
By the end of the tutorial, attendees will be able to use Boa to perform research on programming language usage in the wild. We have previously used Boa to perform a large study on the adoption of Java’s programming language features over time. The tutorial will use actual queries from this study as examples and teach researchers how they too can query hundreds of thousands of open-source projects to see how programming language features are used in practice.
|Slides (PPTX) (splash15-tutorial.pptx)||5.96MiB|