In some optional embodiments, the video stream includes video streams synchronously acquired from a plurality of views of the game desktop, and switching the game progress stage at the time point corresponding to the first image frame in the plurality of the continuous image frames where the specific target object is detected includes: determining the first image frame in the plurality of the continuous image frames where the specific target object is detected in a video stream from one view; determining a synchronous image acquired synchronously with the first image frame in a video stream synchronously acquired from other view; and in response to detecting the specific target object in at least one image frame of N image frames acquired before the synchronous image, switching the game progress stage at a time point corresponding to an image frame where the specific target object is detected earliest in the N image frames acquired before the synchronous image.
In some optional embodiments, the method further includes: in response to not detecting the specific target object in the N image frames acquired before the synchronous image, switching the game progress stage at the time point corresponding to the first image frame.
In some optional embodiments, the plurality of the views includes: a side view and a top view, wherein the side view comprises a left side view of the game desktop and/or a right side view of the game desktop.
In some optional embodiments, detecting a target object for each of image frames included in the video stream includes: for each image frame in the video stream, inputting the image frame into a neural network for target object detection to obtain a detection result, wherein the detection result indicates whether the image frame comprises the specific target object.