>As you do run significant numbers (50 mill) be sure to think about hash collisions - ALL hash functions, CRC included, have information loss which might result in false positives. Very early in computing (disc space being VERY costly then) I had built a structure identifying duplicates via 3 different hashes taken together to form the key and even then check for excact duplication and increment trailing integer in case of collisions...
You could explain more about "information loss" and "false positives"?
It seemed that the hash method was a more sufficient method to obtain an ID on the file. Are you saying that doing it twice might result in a different value sometimes?