>>>The CRC, or any hash or encryption should be done on the file contents - not the size
>>
>>Very well
>>
>>>A CRC is a hash. Since the whole file has to be read in order to calculate the hash, the longer the file the longer it takes the calculate the hash
>>>
>>>Note that the time is not that significant. I can encrypt ( Rijndael) at speeds of 25 to 35 MB/sec. And this is a block cipher with multiple rounds per block (10, 12 or 14) and lookup tables
>>
>>Ok, thanks, if I ever have to implement such detection, I will make it OS free specific.
>
>As you do run significant numbers (50 mill) be sure to think about hash collisions - ALL hash functions, CRC included, have information loss which might result in false positives. Very early in computing (disc space being VERY costly then) I had built a structure identifying duplicates via 3 different hashes taken together to form the key and even then check for excact duplication and increment trailing integer in case of collisions...
>As you do run significant numbers (50 mill) be sure to think about hash collisions -
That's why I suggested MD5
But the images are specific per customer/project - hence not that many per customer/project. Collision is very unlikely
Gregory