>>All you need to do is process the contents of your files once, with a suitable cryptographic hash function, and store the resulting digests:
http://en.wikipedia.org/wiki/Cryptographic_hash_function>>
>>If you don't need high security MD5 is fine. SHA is better. Even SHA-1 with a 160 bit digest has 2^160 possible digest values, which is over 10^48. The chances of collisions in 50 million (5x 10^7) files is vanishingly small; if you actually got one you should notify the crypto community (not kidding).
>
>No, there is no security request as this is only to store a calculated ID of a file which will tell us later on if another file is a duplicate. If that is the case, I would then assume that the same calculated ID would be returned.
>
>So, MD5 would do. If I ever need that again, I would rely on such approach as it came as a big surprise this week when I found out that the storage of a file, especially when resizing it, differs from one environment to another.
CRC is really only useful for error detection (during communications etc.). For the purposes people have been talking about on this forum from time to time, crypto hash functions should always be used instead.
Regards. Al
"Violence is the last refuge of the incompetent." -- Isaac Asimov
"Never let your sense of morals prevent you from doing what is right." -- Isaac Asimov
Neither a despot, nor a doormat, be
Every app wants to be a database app when it grows up