hashing - Any examples of duplicate MD5 Hashes?
2014-07
I was wondering, as it is theoretically possible to have duplicate MD5 hashes, are there any known examples of this, or is it all just theoretical?
The shortest one I know of is below. Collisions do happen and it's something that needs to be taken into consideration when using hashing. Every hash function will have a shelf life MD5 is near it's end of life, Personally only use it for verifying file integrity it's no longer a secure method for storing information just validating it.
Input vector 1:1
0000000 d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
0000020 2f ca b5 87 12 46 7e ab 40 04 58 3e b8 fb 7f 89
0000040 55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 71 41 5a
0000060 08 51 25 e8 f7 cd c9 9f d9 1d bd f2 80 37 3c 5b
0000100 d8 82 3e 31 56 34 8f 5b ae 6d ac d4 36 c9 19 c6
0000120 dd 53 e2 b4 87 da 03 fd 02 39 63 06 d2 48 cd a0
0000140 e9 9f 33 42 0f 57 7e e8 ce 54 b6 70 80 a8 0d 1e
0000160 c6 98 21 bc b6 a8 83 93 96 f9 65 2b 6f f7 2a 70
Input vector 2:
0000000 d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
0000020 2f ca b5 07 12 46 7e ab 40 04 58 3e b8 fb 7f 89
0000040 55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 f1 41 5a
0000060 08 51 25 e8 f7 cd c9 9f d9 1d bd 72 80 37 3c 5b
0000100 d8 82 3e 31 56 34 8f 5b ae 6d ac d4 36 c9 19 c6
0000120 dd 53 e2 34 87 da 03 fd 02 39 63 06 d2 48 cd a0
0000140 e9 9f 33 42 0f 57 7e e8 ce 54 b6 70 80 28 0d 1e
0000160 c6 98 21 bc b6 a8 83 93 96 f9 65 ab 6f f7 2a 70
Digest:
79054025255fb1a26e4bc422aef54eb4
I'm interested in storing an indicator of file / directory integrity between two archived copies of directories. It's around 1TB of data stored recursively on hard drives. Is there a way using OpenSSL to generate a single hash for all the files that can be used as a comparison between two copies of the data, or at a later point to verify the data has not changed?
You could recursively generate all the hashes, concatenate the hashes into a single file, then generate a hash of that file.
You can't do a cumulative hash of them all to make a single hash, but you can compress them first then compute the hash:
$tar -czpf archive1.tar.gz folder1/
$tar -czpf archive2.tar.gz folder2/
$openssl md5 archive1.tar.gz archive2.tar.gz
to recursively hash each file:
$find . -type f -exec openssl md5 {} +
Doing a md5 sum on the tar would never work unless all of the metadata (creation date, etc.) was identical as well, because tar stores that as part of its archive.
I would probably do an md5 sum of the contents of all of the files:
find folder1 -type f | sort | tr '\n' '\0' | xargs -0 cat | openssl md5
find folder2 -type f | sort | tr '\n' '\0' | xargs -0 cat | openssl md5