In some examples, accessing the first and second media data objects may include identifying storage locations of the first and second media data objects and/or loading the first and second media data objects into memory. In some examples, accessing the first and second media data objects may include decoding, interpreting, and/or rendering the first and second media data objects to access content represented by the first and second media data objects. It may be appreciated that it is conceptually possible that the binary content of two media data objects could be completely different while the rendered content of the two media data objects could be nearly perceptually identical; or, conversely, that the stored binary data of two media data objects could be nearly identical while the rendered content of the two media data objects could be perceptually vastly different. Where comparing media data objects is discussed herein, generally the discussion relates to comparing the content as it would be rendered and temporally sequenced rather than comparing the stored binary data (as it would be sequentially stored).
Furthermore, in some examples, systems described herein may pre-process the first and second media data objects and/or extract one or more features from the first and second media data objects. In some examples, as will be described in greater detail below, these systems may divide the content of the media data objects into segments (e.g., of equal temporal length). Thus, for example, these systems may divide audio content into segments of a specified length (e.g., 4 seconds, 2 seconds, etc.). Likewise, these systems may divide video content into segments, each segment corresponding to one frame of video.