A new algorithm for Similarity Preserving Hashing based on mvHash-Damerau Levenshtein for email filtering

Khongbantabam Susila Devi, Dr. Ravi R

Abstract


The handling of a large number of files and documents is a very challenging task in IT forensic investigation. In order to handle the information overwhelm, the forensic investigator used the fingerprints in the form of hash values to identify known -to Good or known to- bad files. For identifying the exact duplicate in files, it uses the Cryptographic hashing techniques that can identify the exact duplicate of the file, but slide changes in the bit position the entire hash value will change, this problem can be solved by using similarity preserving hashing. We present a new algorithm for similarity preserving hashing, which is based on the majority voting along with the run length encoding and Bloom filters to represent the fingerprint and it is known as enhanced mvhash-Damerau L. Here the enhanced mvhash-L is superior to the mvhash-Damerau L compared with the other similarity preserving hashing in terms of run time efficiency.


Full Text:

PDF

Refbacks

  • There are currently no refbacks.