Which programming error messages are the most common? We investigate this question, motivated by writing error explanations better suited to novices. We consider two very large data sets, one in Python and the other in Java, both combining syntax and run-time errors. We group essentially identical messages and then determine the most common ones. In both data sets, we find that the error message frequencies empirically resemble Zipf-Mandelbrot distributions. We use a maximum-likelihood approach to select the distribution parameters. This gives one possible way to contrast languages or compilers quantitatively.

David Pritchard studied computer science and mathematics at MIT and the University of Waterloo, obtaining his PhD in 2010. He taught at Waterloo, EPFL (Switzerland), Princeton University, and the University of Southern California, while developing free software for students to practice and learn introductory programming online. He is currently employed at Google Los Angeles, and continues to volunteer for the Computer Science Circles project, which is hosted by the Center for Education in Mathematics and Computer Science in Waterloo.