We have developed a new image hash based on the Marr wavelet that computes a perceptual hash based on edge information with particular emphasis on corners. It has been shown that the human visual system makes special use of certain retinal cells to distinguish corner-like stimuli. It is the belief that this corner information can be used to distinguish digital images that motivates this approach. Basically, the edge information attained from the wavelet is compressed into a fixed length hash of 72 bytes. Binary quantization allows for relatively fast hamming distance computation between hashes. The following scatter plot shows the results on our standard corpus of images. The first plot shows the distances between each image and its attacked counterpart (e.g. the intra distances). The second plot shows the inter distances between altogether different images. While the hash is not designed to handle rotated images, notice how slight rotations still generally fall within a threshold range and thus can usually be matched as identical. However, the real advantage of this hash is for use with our mvp tree indexing structure. Since it is more descriptive than the dct hash (being 72 bytes in length vs. 8 bytes for the dct hash), there are much fewer false matches retrieved for image queries.
This new hash is available in version 0.8.0.
A variable length DCT video hash is included in the pHash library as of v0.6. It consists of the dct image hash applied to a select number of key frames chosen from the original image sequence. The key frames are selected using an adaptive thresholding technique and are based upon a standard framerate, so as not to be fooled by alterations in the frame rate.
Below are two graphs to show the results on ten different videos and their altered counterparts. For this case, the alteration consists of inserting a few blank frames at the beginning and end of each file. We hope to include some more types of alterations soon. The first plot shows the similarity between each image and its altered counterpart, the "intras", and the second plot shows the comparisons between altogether different ("dissimilar") images. There does appear to be a clear threshold line that can be applied.
The following plot represents the intra comparison for the same set of video files, where the frame rate is changed from about 23 fps to a fixed 18 fps.
As of version 0.5, the pHash library supports a hash storage technique that allows for quick and easy access to a database of hash values. Simply put, the anticipated sequence runs something like this:
The technique stores the hash values, or "data points" into a tree-like structure according to relative distances from chosen vantage points of the data set. This way the problem of storage and lookup of high dimensional data is avoided and the number of distance computations that must be made per query is minimized. The user need only specify the distance function.
So far, preliminary test results reveal a 300% improvement over that of linear search, and the additional storage required amounts to less than 0.05% of the space required for entire database.
The discrete cosine transform (DCT) is an efficient means to compute a hash from
frequency spectrum data, and the distance calculation is relatively simple. While it is
insufficient to consider image similarity in any semantically meaningful way, it does provide
a hash as an ID for an image, and is robust against minor distortions, like small rotations,
blurring and compression. The graphs below show the hamming distances (i.e. the
number of bits that differ in the 64-bit hash) for two scenarios: the intra distances
where the source images are from the same source only one is a distorted version of the other, and
inter distances, where the two compared images are altogether different images.
The main point is that a threshold of twenty-two, T=22, can be applied to determine if two images
are indeed the same source image.
Note: Please ignore the x-axis on the second table. The x-axis is merely a list of comparisons between the specific images.
This method tries to take into account geometric features of the image in extracting a hash value. The idea is to generate a feature vector from the variances of 180 lines drawn through the center of the image, and then compact the feature vector with the discrete cosine tranform (DCT). Images can be compared by looking for correlations between the images' hash values. Where the correlation exceeds a threshold level, the image can be judged to be the same image. From the graphs, a good threshold might be 0.91, but as you can see, there are a couple of comparisons that do fall into a gray area.
While the histogram-based method is not provided in pHash, it is shown here merely for comparison purposes. Here, the histogram of the image is used as a feature vector for the hash, and comparisons are performed as above, using the cross correlation function.
The next set of graphs show the intra and inter PCC values of the Histogram-based feature vector. While the intra PCC values are all above a threshold ~0.84, the inter PCC values show no clear threshold, indicating the histogram based approach would lead to a large number of false positives (e.g. images said to be a match but are truly not).
This hashing method for audio signals extracts a feature vector for every frame of audio from the bark scale frequency spectrum - that is, it only considers those frequencies to which the human auditory system is most sensitive. Here, frames are 0.21 seconds of audio and are 50% overlapping to minimize the impact of time shifts in audio files. As shown in the graphs, there is a clear difference between the inter and intra file comparisons. The distance plots are the distances between two audio signals. A comparison between two audio hashes necessarily produces a vector of distances that mark the distances at potential matching points, as the audio signals could concievably be matched anywhere in the signals. Thus, the distance that is plotted mark the point at which the distance is least. The confidence score is a rating of how good the match can be considered to be. The attacks, or distortions on the signal for the intra comparisons are mp3 compression at 32kbps and 128kbps, and telephone simulated filtering.