In the embodiments of the present disclosure, if the specific target object is not detected in each of the N image frames before the synchronous images from different views, it indicates that in the present round of the game, the time point when the specific target object first appears is the time point corresponding to the first image frame, and therefore, the time point corresponding to the first image frame is used as the time point for switching the game progress stage.
In some optional embodiments, step 102 may include:
for each image frame in the video stream, inputting the image frame into a neural network for target object detection to obtain a detection result, where the detection result indicates whether the image frame includes the specific target object.
In the embodiments of the present disclosure, a neural network for target object detection may be pre-established. The neural network may adopt deep learning Faster Region Proposal Networks Convolutional Neural Networks (RCNN). In a training process, a plurality of sample image frames with annotations are used as input of the neural network, where the plurality of sample images may respectively include at least one of the target objects, and output of the neural network is an object detection result for each sample image frame. According to a target object truth value (e.g., the annotation) on the plurality of sample image frames, network parameters of the neural network are adjusted, so that a loss function is minimized, and the neural network for target object detection is trained.