Reference is now made to FIG. 1 where an exemplary and non-limiting schematic 100 illustration of a synchronized human voice 110 and subtitle 120 streams. According to a principle of the invention an audiovisual content that is accompanies by a subtitles file first is analyzed to determine human voice segments marked in this case HA1 through HA4. In the subtitles stream (which may be a stream or a file or any other relevant source of subtitles for the audiovisual content) subtitles are provided together with their respective timing information and are marked ST1 through ST4. The subtitles are received with their respective timing information, for example but not by way of limitation, their start and end time display, or the start time and duration of display. In FIG. 1 it is seen that each of the human voice segments HA1 (‘i’ being an integer equal to 1 or more) perfectly correspond to a subtitle STi. This is the case where the system, described in greater detail herein, does not need to perform any kind of subtitle to human voice alignment or synchronization.