Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Fifty ways to find your brother - searches
Message
De
02/12/2019 11:28:20
 
 
À
02/12/2019 09:41:31
Information générale
Forum:
Visual FoxPro
Catégorie:
Codage, syntaxe et commandes
Divers
Thread ID:
01672112
Message ID:
01672128
Vues:
83
Too bad I am to shy to hint at local dealers reachable via Amazon... ;-))

I have spent quite a few weeks trying to optimize my result sets - was never really really able to separate alpha from beta errors automatically generated more often by fuzzy approach. Was reason to create my own rule engine to tune filtering to different input sources, got me further, but in the end:

Learn to live with it, even if it goes against previous conditioning that machine results ought to be correct.

>Thanks Thomas...all great suggestions...but sorry, champagne does not travel well...
>
>>Few hints: even bad coded, but correctly running C version runs circles over my optimized version in vfp source - I was quit proud of my cutting time needed in half compared to version 1 ad 2/3 of runtime of the code Craig wrote - he is a no-nonsense coder, but did not try for least interpreted lines, resulting in fast runtime but slow code comprehension ;-)
>>
>>With C you cut the times needed down to 15-30% of my needed time in "my" vfp, or to about 10% of the time needed by 1st example.
>>
>>Do NOT try for "winning result" or "closest match". Allow a human selection of the list of best results - perhaps making list size dynamic as well if criteria are good. It is typical that one approach is good on area 1, another in area 2, both might fail in area 4...
>>
>>Do NOT try for a single result dimension during "comparison run". "Comparison run" takes most of the time, save result for each run criterion in a discrete numeric field. You can permutate those discrete result field into single result dimension very fast with vfp, allowing you to play with all check runs without repeating slow "comparison run", for targeting pivot-table-like experiments with the check results.
>>
>>Concentrate on a good ordinal scale to order your results on built from those discrete results and let the user decide, perhaps by offering him a "give me 3x result size, I want to check more" option in GUI. Work with data, user AND selection criteria ;-)
>>
>>Allow for hyphenated last names, birth names and perhaps last names when in a different marriage, if possible.
>>
>>[send me a bottle of champagne for every day the above hints saved in coding time] ;-)
>>
>>>Not a lot of time to look at it today but my quick look says its very interesting!
>>>Here is the link: http://fox.wikis.com/wc.dll?Wiki~LevenshteinAlgorithm
>>
>>>>Google for Levenshtein in vfp and you will find my vfp-optimized code, IIRC foxpro Wiki
>>>>but it is much better if you link in one of the C versions, as vfp string mechanics are not well suited to the task
>>>>Best done twice: once correct, once phonetically simplyfied/normalized.
>>
>>>>>This is plaintiff data, relatively small (only about 50k records after 40 years in business). And we don't get the address, phone data etc. until the file is settled (years into the process) - not that I would have wanted to have had to add that into the mix.
>>>>>My guess is that it would have taken a LONG time to work all that code into your searches.
>>>>>
>>>>>
>>>>>>yes to all except for 3,4 and 13.
>>>>>>Not PHdBase, but 2 similar approaches
>>>>>>Plus a few more, like calculating edit distance as they call it today ;-)
>>>>>>Plus doing the same on adress fields
>>>>>>plus contact data (multiple phone, email...)
>>>>>>Plus a rule engine which can be tweaked for certain profiles of data
>>>>>>Plus a scoring system so you can order the "relevance" estimated for further tweaking.
>>>>>>
>>>>>>used to run regularly on 1 - 9 million data entries, linked across more than a couple of related tables.
>>>>>>looking for duplicates, weeding/singling out family groups
>>>>>>target marketing
>>>>>>
>>>>>>system grew over a few years ;-)
>>>>>>
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform