Development Guide

Hash Functions

First of all, it is important to mention that a perceptual hash (phash) is a signature of an underlying media source file's perceptual content. I say this because it is important to remember what it is not. It cannot do image or audio recognition of certain phrases or image features. It probably cannot even detect similar artifacts from two different source files - e.g. two different photographs of the same person. Although it may detect a similarity if the lighting and camera angles are nearly identical. Perceptual hashes are mainly for detecting duplicates of the same files in a way that the standard cryptographic hashes generally fail. Since the cryptographic hash algorithms compute the signatures from the specific byte stream in the file, it generally returns very different hashes depending on which file format is used and when minor image processing alterations are applied. Perceptual hashing aims to return a signature that is robust against such distortions, such that a distance measure between hashes returns a relatively close distance for similar hashes. Thus a threshold can be applied to determine if two files are the same or different.

So far, pHash is capable of computing hashes for audio, video and image files. You start by computing hash values for the media files. The following are the function prototypes for the three methods:

int ph_dct_imagehash(const char *file, ulong64 &hash);
ulong64* ph_dct_videohash(const char *file, int &Length);
uint32_t* ph_audiohash(float *buf, int N, int sr, int &nbFrames);

The image hash is returned in the hash parameter. Audio hashes are returned as an array of uint32_t types with the nbFrames parameter indicating the buffer length. Video hashes are returned as an array of uint64_t types with the Length parameter indicating the number of elements in the array. You must free the memory returned by ph_audiohash() and ph_dct_videohash() with free() when you are finished using it. For the audio hash, you must read the data into the buffer first. You do so with this function:

float* ph_readaudio(const char *filename, int sr, int channels, int &N);

(It is recommended you use sr as 8000 and just one channel.) N will be the length of your returned buffer. You must free the memory returned by ph_readaudio() with free() when you are finished using it (usually after a call to ph_audiohash()).

There are image hash functions that use the radial hash projections method, rather than the discrete cosine transform (dct), but their results have not shown to be as good as the dct.

Once you have the hashes for two files, you can compare them. The functions you use to compare two files are as follows:

int ph_hamming_distance(ulong64 hasha, ulong64 hashb);
double* ph_audio_distance_ber(uint32_t *hasha, int Na, uint32_t *hashb, int Nb, float threshold, int block_size, int &Nc);

The hamming distance function can be used for both video and image hashes. For audio distance, the threshold should be around 0.30 (0.25 to 0.35), the block_size should be 256. The block_size is just the number of blocks of uint32 types to compare at a time in computing the bit error rate (ber) between two hashes. It returns a double buffer of length Nc, which is a confidence vector. It basically gives a confidence rating that indicates the similarity of two hashes at various positions of alignment. The maximum of this confidence vector should be a fairly good indication of similarity, a value of 0.5 being the threshold. You must free the memory returned by ph_audio_distance_ber() with free() when you are finished using it.

If you want to use the radial hashing method for images, the function for getting the hash is here:

ph_image_digest(const char *file, double sigma, double gamma, Digest &dig, N);

Use values sigma=1.0 and gamma=1.0 for now. N indicates the number of lines to project through the center for 0 to 180 degrees orientation. Use 180. Be sure to declare a digest before calling the function, like this:

Digest dig;
ph_image_digest(filename, 1.0, 1.0, dig, 180);

The function returns -1 if it fails. This standard will be found in most of the functions in the pHash library.

To compare two radial hashes, a peak of cross correlation is determined between two hashes:

int ph_crosscorr(Digest &x, Digest &y, double &pcc, double threshold=0.90);

The peak of cross correlation between the two vectors is returned in the pcc parameter.

MVP Hash Storage

Using the mvp functions for hash storage is fairly straightforward - the basic idea being to build, add and query the db, and there are three functions in the api to do just that:

MVPRetCode ph_save_mvptree(MVPFile *m, DP **points, int nbpoints);
MVPRetCode ph_add_mvptree(MVPFile *m, DP *new_dp);
MVPRetCode ph_query_mvptree(MVPFile *m, DP *query, int knearest, float radius, DP **results, int *count);

The functions are all documented in the pHash.h header file, but MVPRetCode is simply an enumerated type to indicate an error message. Zero indicates success; nonzero values indicate error. A datapoint, or DP is just a pHash structure to hold a file name and hash value.

First, you will need an MVPFile struct initialized with appropriate values.

MVPFile mfile;
mfile.branchfactor = 2;
mfile.pathlength = 5;
mfile.leafcapacity = 25;

Or, you can just use the void ph_init_mvpfile(MVPFile *m) function to initiate the fields to those values. You only need to set these three members if you wish to experiment with other values. You will also need to set these fields:

mfile.hash_type = (HashType)type
mfile.hashdist = funcCB

Obviously, the values here will depend on what hash you are using and what distance function you want for the callback. The callback must follow this form:

float hashdist(DP *dpA, DP *dpB);

In order to build the db, you will need an array of datapoints. The pHash function char** ph_readfilenames(const char* dirname, int &N) will read the files from a given directory and give you the list of filenames. The reference parameter, N will tell you how many files there are. From there you can loop through the files, create a hash for each file, and store the hash and filename in the new datapoint struct. Use DP *dp = ph_malloc_datapoint(m.pathlength, m.hash_type) to get a pointer to a new DP struct. Be sure to assign the filename, hash and hash_length to the respective fields in the DP.

The pointers to the newly created dp's can be stored in an array of pointers to the datapoints. This is important, because when points are sorted into the file format, the actual datapoints are never reassigned, just the pointers to those original datapoints.

In the examples directory in the download, you will find the following files: build_mvptree.cpp, add_mvptree.cpp, query_mvptree.cpp. These should demonstrate how to do each operation for the dct image hash. You will need a directory of images with at least 28 images - i.e. more images than it takes to fill a leaf node of the tree structure, or three greater than the mfile.leafcapacity. You will also find three similarly named files for the audio hash function.