Hi Markus
feeling lucky today ? <bg>
SET UNIQUE ON
INDEX ON Hash( Vorname + Name + Strasse + HausNr + PLZ + Ort + eMail, 5 ) TO _FOLDER_ + "Table1.idx" COMPACT
is guaranteed to be faster than Two-Step-Dance-Routine, but what happens in case of hash collisions due to imperfect hash functions / not enough key size ?
I spent some time checking out duplicate frequency cut off points to get the best strategy for hash collision treatment and you optimize it away<g>. To be honest, it might be a better strategy to aim for larger key values, increasing the possible hash bins for the 5*10**5 records than to implement time consuming collision straegies, as collision managment can be quite costly.
I had planned to check different Hashfunctions (not down to 2Byte size, but perhaps 4 or 3 Bytes) - 3 bytes will give you 29% chace of accidental failure, but 4 bytes with a nearly perfectly distributing hash function will give you 0.012% failure chance - not bad for a contest but would be bad for production code and especially bad for select distinct Samuel was writing about <bg>.