Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Dotnetpro database performance contest
Message
From
13/05/2007 04:55:18
 
 
To
12/05/2007 23:21:51
Markus Winhard
Lauton Software GmbH
Nürnberg, Germany
General information
Forum:
Visual FoxPro
Category:
VFP Compiler for .NET
Miscellaneous
Thread ID:
01224231
Message ID:
01225042
Views:
22
>Hi Thomas,
>
>>@Markus: Do you think such an approach would be considered cheating ?
>
>1. It's allowed to create Table1 at runtime.
>2. After step 1 the contents of Adressen.txt have to be in Table1.

So probably another 5% can be saved by header manipulation.
>
>IMHO we can cheat as we like as long as we get there. But I wouldn't go that far. Removing duplicates faster will gain more overall speed.

Agreed, but INdex Unique already is my easy fallback for fast duplicate identification as well. So I put on my thinking cap and also ran a few tests, but nothing finished yet. In theory we should be able to get faster identification using hashes
if we have a fast hash function
and the number of duplicates is not inordinatly high.

I have done a small test and found that even using a MD5/16 Byte hash and Indexing on this 16Byte field are roughly more than twice as fast than creating the unique index. The times needed were roughly distributed 2/3 generating the hash values and 1/3 creating the index on a 16 byte field. I have not coded the needed comparisons on equal hash values, but here I assume low frequencies (<5) of identical hash values with different record strings, so the most primitive comparison structures (at or ascan) will be better as the setup overhead is minimal there as well. I estimate the added overhead to less than the time saved, but will have to implement. That speedup is also seen in the smaller Table1. A faster hash function would definitely help (mometarily tried Craig Boyd and Ed Leafe) and with only 500K Recs a smaller result value should work better, as the time needed to calculate is reduced, the time indexing is reduced, whereas the number of "false identicals" is raised.

more on the topic tonight - and perhaps some ggogling for a fast hash in C and on to create a fll.


regards

thomas
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform