The person extraction unit 10 detects a person(s) from each frame. Then, the person extraction unit 10 groups the persons detected from the different frames such that those having outer appearance feature values (for example, face feature values) that are similar to each other by a predetermined level or more belong to the same group. As a result, in a case where the same person is detected from a plurality of frames, they can be grouped. Accordingly, it is possible to determine in which frame each person detected in the moving image data 100 appears.
Based on the relationship between each of the plurality of time windows and frames included in each time window, the person extraction unit 10 determines whether or not each person detected in the moving image data 100 appears in each of the plurality of time windows. In a case where a person detected in the moving image data 100 appears in at least one of a plurality of frames included in a first time window, the person is determined as appearing in the first time window.
As a result, a determination result shown in