First, the outline of the present example embodiment will be described. A data processing apparatus 1 of the present example embodiment analyzes text data. The data processing apparatus 1 detects multiple word strings from the text data. A word string is a group composed of multiple words. For example, a word string may be multiple words in one sentence, multiple words in one paragraph, multiple words in one chapter, multiple words in one article, and multiple words on one page. In addition, multiple words from other groups may also be set as one word string.
After detecting multiple word strings, the data processing apparatus 1 groups the word strings having similarities equal to or higher than a predetermined level. In this manner, the word strings relevant to similar topics can be grouped.
Thereafter, the data processing apparatus 1 extracts a group of word strings whose appearance frequency in the text data to be analyzed is equal to or higher than a predetermined level among multiple groups of word strings, and outputs information regarding the extracted group of word strings.
Next, the configuration of the data processing apparatus 1 of the present example embodiment will be described in detail. A functional block diagram of the data processing apparatus 1 of the present example embodiment is shown in