In some optional embodiments, the video stream comprise video streams synchronously acquired from a plurality of views of the game desktop, and the game stage switching module includes: a first determining sub-module, configured to determine the first image frame in the plurality of the continuous image frames where the specific target object is detected in a video stream from one view; a second determining sub-module, configured to determine a synchronous image acquired synchronously with the first image frame in a video stream synchronously acquired from other view; and a first game stage switching sub-module, configured to, in response to detecting the specific target object in at least one image frame of N image frames acquired before the synchronous image, switch the game progress stage at a time point corresponding to an image frame where the specific target object is detected earliest in the N image frames acquired before the synchronous image.
In some optional embodiments, the first game stage switching sub-module is configured to: in response to not detecting the specific target object in the N image frames acquired before the synchronous image, switch the game progress stage at the time point corresponding to the first image frame.
In some optional embodiments, the plurality of the views includes: a side view and a top view, where the side view includes a left side view of the game desktop and/or a right side view of the game desktop.
In some optional embodiments, the object detection module includes: an object detection sub-module, configured to, for each image frame in the video stream, input the image frame into a neural network for target object detection to obtain a detection result, wherein the detection result indicates whether the image frame comprises the specific target object.