Dotnetpro database performance contest

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Dotnetpro database performance contest

Message

14/05/2007 14:05:26

Thomas Ganss
Main Trend
Frankfurt, Allemagne

14/05/2007 11:03:26

Samuel David
Etecnologia
Miami, Floride, États-Unis

Information générale

Forum:

Visual FoxPro

Catégorie:

VFP Compiler for .NET

Titre:

Re: Dotnetpro database performance contest

Divers

Thread ID:

01224231

Message ID:

01225314

Vues:

Hi Samuel,

>Even several bytes (16 bytes) Hash Functions does not warrant collision free results, but they indeed make the collision detection time lower.

I'ld say that the size of the return value should mainly be based on the no. of records needed to compare. If the quality of the hash fucntion is high it will distribute all records evenly across the number of bins characterized by the possible key values.

>The right approach should be one based in the record length, so the number of bytes of the hash function is dependant of the record length in order to keep the expected falses positives values small and at the same time the hash funciton not to be so time consuming. Someone could come with a length based formula that gives the most optimum hash length based in the time used to compute them and the savings in comparisons of false positives.

I think the hash functions for key distribution across bins are slightly differetn from the hash functions used in cryptography, but have not followed the issue.

>
>By the way if the hash function also is proportional to the byte values, i.e. if the hash value is lower for values as "abc" and greater for values "mbc", then it can be used to speed up the sorting and then just sort the few collisions. It is faster to sort just hashcodes and then perform the lenghty byte comparisons.

I think this would automatically mean "bad hash function" as the distribution across the bins would be defined by the first bytes. The idea itself is valid, but should be described as a order-keeping compression, which is different from the "fingerprint"-hash idea.

>So this is another application for sorting, and I can see others as grouping.
Yupp, this was one strategy mainly planned for select distinct and grouping large fields in the SQL interpreter I wrote back in the early 80ies, as I did not know the fox back then.

regards

thomas

Répondre

Fil

Voir

Click here to load this message in the networking platform