That is, the person extraction unit 10 analyzes the moving image data in units of predetermined time windows, and determines whether or not each person detected in the moving image data appears in each of a plurality of time windows. Then, based on the determination result, the person extraction unit 10 calculates an appearance frequency for each person detected in the moving image data. The person extraction unit 10 then extracts a person whose appearance frequency satisfies the predetermined condition among persons detected in the moving image data.
Thereafter, the output unit 20 outputs the information regarding the person extracted in S11 (S12). For example, the output unit 20 outputs a face image (acquired from the moving image data) of the person extracted in S11.
According to the present example embodiment described above, a person (person to be retrieved) satisfying a predetermined condition (high appearance frequency) can be retrieved from the moving image data under circumstances in which the feature value of the person to be retrieved cannot be provided to the apparatus, since the person to be retrieved has not been identified.
Although an example has been described in which the moving image data to be analyzed is “moving image data captured at the same place over a predetermined time period”, the moving image data to be analyzed may also be “moving image data captured at a plurality of places over a predetermined time period”. Also in this case, the same advantageous effect can be achieved by the same processing.