Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Dotnetpro database performance contest
Message
 
À
14/05/2007 07:13:34
Information générale
Forum:
Visual FoxPro
Catégorie:
VFP Compiler for .NET
Divers
Thread ID:
01224231
Message ID:
01225256
Vues:
18
Hi Thomas:

Even several bytes (16 bytes) Hash Functions does not warrant collision free results, but they indeed make the collision detection time lower. The right approach should be one based in the record length, so the number of bytes of the hash function is dependant of the record length in order to keep the expected falses positives values small and at the same time the hash funciton not to be so time consuming. Someone could come with a length based formula that gives the most optimum hash length based in the time used to compute them and the savings in comparisons of false positives.

By the way if the hash function also is proportional to the byte values, i.e. if the hash value is lower for values as "abc" and greater for values "mbc", then it can be used to speed up the sorting and then just sort the few collisions. It is faster to sort just hashcodes and then perform the lenghty byte comparisons.

So this is another application for sorting, and I can see others as grouping.



>Hi Markus
>
>feeling lucky today ? <bg>
>
>*--------------------------------------------
>* Copy only the unique records to a new table
>* using a different table structure.
>*--------------------------------------------
>SET UNIQUE ON
>INDEX ON Hash( Vorname + Name + Strasse + HausNr + PLZ + Ort + eMail, 5 ) TO _FOLDER_ + "Table1.idx" COMPACT
>
is guaranteed to be faster than Two-Step-Dance-Routine, but what happens in case of hash collisions due to imperfect hash functions / not enough key size ?
>
>I spent some time checking out duplicate frequency cut off points to get the best strategy for hash collision treatment and you optimize it away<g>. To be honest, it might be a better strategy to aim for larger key values, increasing the possible hash bins for the 5*10**5 records than to implement time consuming collision straegies, as collision managment can be quite costly.
>I had planned to check different Hashfunctions (not down to 2Byte size, but perhaps 4 or 3 Bytes) - 3 bytes will give you 29% chace of accidental failure, but 4 bytes with a nearly perfectly distributing hash function will give you 0.012% failure chance - not bad for a contest but would be bad for production code and especially bad for select distinct Samuel was writing about <bg>.
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform