If the view streams are acquired from three or more views, the synchronous images of the first image frame are required to be found from each of the views. If it is determined that the specific target object is detected in at least one image frame in N image frames before the synchronous images from at least two of the views, time points corresponding to the images where the target object is firstly detected in the N image frames before the synchronous images from the at least two of the views are determined respectively, the earliest time point can be found in the time points, and the game progress stage is switched at the earliest time point, so that the game progress stage is switched at the earliest time point when the specific target object appears in a round of the game. Therefore, it is more accurate to determine a game progress stage according to video streams acquired from a plurality of views, than according to a video stream acquired from a single view.
In the embodiments, in a case that the video stream includes video streams synchronously acquired from a plurality of views of the game desktop, a time point for switching a game progress stage is determined quickly, and each round of the game is distinguished, so that the accuracy of switching the game progress stage is improved.
In some optional embodiments, as shown in 
at step 104, in response to not detecting the specific target object in the N image frames acquired before the synchronous image, switching the game progress stage at the time point corresponding to the first image frame.