>All you need to do is process the contents of your files once, with a suitable cryptographic hash function, and store the resulting digests:
http://en.wikipedia.org/wiki/Cryptographic_hash_function>
>If you don't need high security MD5 is fine. SHA is better. Even SHA-1 with a 160 bit digest has 2^160 possible digest values, which is over 10^48. The chances of collisions in 50 million (5x 10^7) files is vanishingly small; if you actually got one you should notify the crypto community (not kidding).
No, there is no security request as this is only to store a calculated ID of a file which will tell us later on if another file is a duplicate. If that is the case, I would then assume that the same calculated ID would be returned.
So, MD5 would do. If I ever need that again, I would rely on such approach as it came as a big surprise this week when I found out that the storage of a file, especially when resizing it, differs from one environment to another.