Dotnetpro database performance contest

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Dotnetpro database performance contest

Message

From

13/05/2007 04:55:18

Thomas Ganss
Main Trend
Frankfurt, Germany

12/05/2007 23:21:51

Markus Winhard
Lauton Software GmbH
Nürnberg, Germany

General information

Forum:

Visual FoxPro

Category:

VFP Compiler for .NET

Title:

Re: Dotnetpro database performance contest

Miscellaneous

Thread ID:

01224231

Message ID:

01225042

Views:

>Hi Thomas,
>
>>@Markus: Do you think such an approach would be considered cheating ?
>
>1. It's allowed to create Table1 at runtime.
>2. After step 1 the contents of Adressen.txt have to be in Table1.

So probably another 5% can be saved by header manipulation.
>
>IMHO we can cheat as we like as long as we get there. But I wouldn't go that far. Removing duplicates faster will gain more overall speed.

Agreed, but INdex Unique already is my easy fallback for fast duplicate identification as well. So I put on my thinking cap and also ran a few tests, but nothing finished yet. In theory we should be able to get faster identification using hashes
if we have a fast hash function
and the number of duplicates is not inordinatly high.

I have done a small test and found that even using a MD5/16 Byte hash and Indexing on this 16Byte field are roughly more than twice as fast than creating the unique index. The times needed were roughly distributed 2/3 generating the hash values and 1/3 creating the index on a 16 byte field. I have not coded the needed comparisons on equal hash values, but here I assume low frequencies (<5) of identical hash values with different record strings, so the most primitive comparison structures (at or ascan) will be better as the setup overhead is minimal there as well. I estimate the added overhead to less than the time saved, but will have to implement. That speedup is also seen in the smaller Table1. A faster hash function would definitely help (mometarily tried Craig Boyd and Ed Leafe) and with only 500K Recs a smaller result value should work better, as the time needed to calculate is reduced, the time indexing is reduced, whereas the number of "false identicals" is raised.

more on the topic tonight - and perhaps some ggogling for a fast hash in C and on to create a fll.

regards

thomas

Map

View

Click here to load this message in the networking platform