In an embodiment of the present disclosure, the video stream acquired from a view of the game desktop may be obtained. For example, the view may be a side view or a top view. In another embodiment of the present disclosure, to improve the accuracy of video stream analysis, video streams synchronously acquired from a plurality of views of the game desktop may be obtained. Optionally, the plurality of views includes a side view and a top view, where the side view includes a left side view of the game desktop and/or a right side view of the game desktop.
As shown in 
At step 102, a target object is detected for each of image frames included in the video stream.
In the embodiments of the present disclosure, the target object includes, but not limited to, any one or a combination of: a person, a game chip, a game card, a game currency, or the like.
Each video stream includes a plurality of image frames. Each image frame is to detect whether there is a target object.
At step 103, in response to detecting a specific target object in at least one image frame in the video stream, a game progress stage is switched at a time point corresponding to a first image frame in a plurality of continuous image frames where the specific target object is detected.