The Use of Text Retrieval and Natural Language Processing in Software Engineering
During software evolution many related artifacts are created or modified. Some of these are composed of structured data (e.g., analysis data), some contain semi-structured information (e.g., source code), and many include unstructured information (e.g., natural language text). Software artifacts written in natural language (e.g., requirements, bug reports, etc.), together with the comments and identifiers in the source code encode to a large degree the domain of the software, the developers’ knowledge about the system, capture design decisions, developer information, etc. Retrieving and analyzing the textual information existing in software is extremely important for supporting a variety of soft- ware engineering (SE) tasks. Text retrieval (TR) is a branch of information retrieval (IR) that leverages information stored primarily in the form of text. TR methods have been proved as suitable candidates for the retrieval and the analysis of textual data embedded in software or present in other sources. In most SE applications, TR techniques are used in conjunction with natural language processing (NLP) tools. This tutorial presents the state of the art TR and NLP techniques used in Software Engineering and discusses their applications in the field.