A non-I frame coding module 703 is configured to perform the H.264 standard encoding on the sampling picture of the non-I frame by using the sampling picture of the reconstructed frame of the I frame corresponding to the non-I frame as a reference frame. Specifically, if the non-I frame is a P frame, the corresponding I frame thereof is the I frame prior to the P frame in accordance with a play order in the video frame sequence; if the non-I frame is a B frame, the corresponding I frame is the I frame prior to and/or the I frame after the B frame in accordance with the play order in the video frame sequence, that is, if the forward prediction mode is used, the corresponding I frame of the B frame is the same as that of the P frame which is the previous I frame in accordance with the play order in the video frame sequence; if the backward prediction mode is used, the corresponding I frame thereof is the I frame behind the B frame in accordance with the play order in the video frame sequence; and if the bidirectional prediction mode is used, the corresponding I frame thereof is the I frame previous to and the I frame after the B frame in accordance with the play order in the video frame sequence. When the non-I frame coding module 703 performs encoding on a certain non-I frame, first, the corresponding I frame of the non-I frame needs to be found, so that the first sampling image of the reconstructed frame of the I frame can be obtained and used as the reference frame, and then, according to the reference frame, the H.264 standard coding may be performed on the second sampling image of the non-I frame obtained from the down-sampling. Some implementations may further include: performing motion estimation on the second sampling image of the non-I frame obtained from the down-sampling in units of macroblocks in the reference frame, so as to find the most closely matching macroblock in the reference frame. Therefore, a motion vector of the macroblock is obtained and residual data is obtained by comparing an actual macroblock with a matching macroblock in the reference frame. Further, the residual data is packed together with the motion vector after integer transform, quantization, reordering, and entropy coding, so that the data packet applicable to network transmission, i.e., the coding result of the non-I frame is obtained.