Environment versions
Network:
Windows 2003 Server
>>The CRC, or any hash or encryption should be done on the file contents - not the size
>
>Very well
>
>>A CRC is a hash. Since the whole file has to be read in order to calculate the hash, the longer the file the longer it takes the calculate the hash
>>
>>Note that the time is not that significant. I can encrypt ( Rijndael) at speeds of 25 to 35 MB/sec. And this is a block cipher with multiple rounds per block (10, 12 or 14) and lookup tables
>
>Ok, thanks, if I ever have to implement such detection, I will make it OS free specific.
As you do run significant numbers (50 mill) be sure to think about hash collisions - ALL hash functions, CRC included, have information loss which might result in false positives. Very early in computing (disc space being VERY costly then) I had built a structure identifying duplicates via 3 different hashes taken together to form the key and even then check for excact duplication and increment trailing integer in case of collisions...
Previous
Next
Reply
View the map of this thread
View the map of this thread starting from this message only
View all messages of this thread
View all messages of this thread starting from this message only