Frequency Distribution of Error Messages
Which programming error messages are the most common? We investigate this question, motivated by writing error explanations better suited to novices. We consider two very large data sets, one in Python and the other in Java, both combining syntax and run-time errors. We group essentially identical messages and then determine the most common ones. In both data sets, we find that the error message frequencies empirically resemble Zipf-Mandelbrot distributions. We use a maximum-likelihood approach to select the distribution parameters. This gives one possible way to contrast languages or compilers quantitatively.
|Frequency Distribution of Error Messages (plateau2015-pritchard.pdf)||960KiB|
David Pritchard studied computer science and mathematics at MIT and the University of Waterloo, obtaining his PhD in 2010. He taught at Waterloo, EPFL (Switzerland), Princeton University, and the University of Southern California, while developing free software for students to practice and learn introductory programming online. He is currently employed at Google Los Angeles, and continues to volunteer for the Computer Science Circles project, which is hosted by the Center for Education in Mathematics and Computer Science in Waterloo.
Mon 26 Oct
|10:30 - 10:50|
David PritchardUniversity of Waterloo, CanadaFile Attached
|10:50 - 11:10|
Milan KabáčUniversity of Bordeaux / Inria Bordeaux / LaBRI, Nic VolanschiInria Bordeaux, Charles ConselUniversity of BordeauxFile Attached
|11:10 - 11:30|
|11:30 - 11:40|
Andrei ChişUniversity of Bern, Switzerland, Tudor Gîrbatudorgirba.com, Switzerland, Oscar NierstraszUniversity of Bern, SwitzerlandPre-print Media Attached File Attached
|11:40 - 11:53|