S104: Perform the H.264 standard encoding on the second sampling image of the non-I frame by using the first sampling image of the reconstructed frame of the I frame that corresponds to the non-I frame as a reference frame. Specifically, if the non-I frame is the P frame, the corresponding I frame thereof is the I frame prior to the P frame in the video frame sequence in accordance with the play order; if the non-I frame is the B frame, the corresponding I frame is the I frame prior to and/or the I frame after the B frame in the video frame sequence in accordance with the play order, that is, if the forward prediction mode is used, the corresponding I frame of the B frame is the same as that of the P frame which is the previous I frame in accordance with the play order in the video frame sequence; if a backward prediction mode is used, the corresponding I frame thereof is the I frame after the B frame in accordance with the play order in the video frame sequence; and if the bidirectional prediction mode is used, the corresponding I frame thereof is the I frame prior to and the I frame after the B frame in accordance with the play order in the video frame sequence. In order to perform coding on a certain non-I frame, first, the corresponding I frame of the non-I frame needs to be found, so that the first sampling image of the reconstructed frame of the I frame can be obtained and used as a reference frame, and then, according to the reference frame, the H.264 standard coding may be performed on the second sampling image of the non-I frame obtained from the down-sampling, specifically including: performing motion estimation on the second sampling image of the non-I frame obtained from the down-sampling in units of macroblocks in the reference frame, so as to find the most closely matching macroblock in the reference frame. Further, a motion vector of the macroblock is obtained and residual data is obtained by comparing an actual macroblock with a matching macroblock in the reference frame. And further, packing the residual data together with the motion vector after the residual data is integer transformed, quantized, reordered, and entropy coded, a data packet applicable to network transmission is obtained, that is, a coding result of the non-I frame, is obtained.