General information
Category:
VFP Compiler for .NET
>Hi Thomas,
>
>>@Markus: Do you think such an approach would be considered cheating ?
>
>1. It's allowed to create Table1 at runtime.
>2. After step 1 the contents of Adressen.txt have to be in Table1.
So probably another 5% can be saved by header manipulation.
>
>IMHO we can cheat as we like as long as we get there. But I wouldn't go that far. Removing duplicates faster will gain more overall speed.
Agreed, but INdex Unique already is my easy fallback for fast duplicate identification as well. So I put on my thinking cap and also ran a few tests, but nothing finished yet. In theory we should be able to get faster identification using hashes
if we have a fast hash function
and the number of duplicates is not inordinatly high.
I have done a small test and found that even using a MD5/16 Byte hash and Indexing on this 16Byte field are roughly more than twice as fast than creating the unique index. The times needed were roughly distributed 2/3 generating the hash values and 1/3 creating the index on a 16 byte field. I have not coded the needed comparisons on equal hash values, but here I assume low frequencies (<5) of identical hash values with different record strings, so the most primitive comparison structures (at or ascan) will be better as the setup overhead is minimal there as well. I estimate the added overhead to less than the time saved, but will have to implement. That speedup is also seen in the smaller Table1. A faster hash function would definitely help (mometarily tried Craig Boyd and Ed Leafe) and with only 500K Recs a smaller result value should work better, as the time needed to calculate is reduced, the time indexing is reduced, whereas the number of "false identicals" is raised.
more on the topic tonight - and perhaps some ggogling for a fast hash in C and on to create a fll.
regards
thomas
Previous
Next
Reply
View the map of this thread
View the map of this thread starting from this message only
View all messages of this thread
View all messages of this thread starting from this message only