Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Fifty ways to find your brother - searches
Message
De
28/11/2019 17:15:52
 
 
À
28/11/2019 15:04:47
Information générale
Forum:
Visual FoxPro
Catégorie:
Codage, syntaxe et commandes
Divers
Thread ID:
01672112
Message ID:
01672113
Vues:
81
yes to all except for 3,4 and 13.
Not PHdBase, but 2 similar approaches
Plus a few more, like calculating edit distance as they call it today ;-)
Plus doing the same on adress fields
plus contact data (multiple phone, email...)
Plus a rule engine which can be tweaked for certain profiles of data
Plus a scoring system so you can order the "relevance" estimated for further tweaking.

used to run regularly on 1 - 9 million data entries, linked across more than a couple of related tables.
looking for duplicates, weeding/singling out family groups
target marketing

system grew over a few years ;-)


>Hi all,
>
>Probably a slow posting day since many are indulging in something other than VFP code today. If you don't "get" the title, try googling it...(hint, Paul Simon).
>
>I just finished writing up some notes for our requirements document for a rewrite of one of my client's apps. I wanted to write an article on this years ago when I first worked on this class but never got around to it. But here it is as written for the requirements doc. Note the new system will not be in VFP so there is not much VFP detail - but you can figure that out.
>
>As the preamble to this section in the document says, most of these were developed after someone would come to me and say "why did the system not find this name" - so the list may not be exhaustive but the searches were added based upon the frequency of not finding the contact in their system.
>
>I would be interested to know if anyone else has any classes that do something like this or other systems you have hooked into to do similar searches or even if you have incorporated other methods to catch duplicate names (before they are entered by the user).
>
>Searches performed in order of priority/ranking:
>
>1. Exact match: last name and first name must match exactly (but case insensitive). User can enter just the last name or the first name (although they tend to get too many matches with just one name). Birthdate is not a factor in the search (or in any name searches, it is done separately below).
>
>2. Partial match: last name and/or first name can match partially - i.e. the comparison on the two strings is done until one string runs out of characters (a VFP string comparison feature); e.g. if the entered string is “Rob” and the string in the table is “Robert”, the strings match up until the point where the 1st string runs out of characters and so it is considered a match. Note that both last name and first name are required as typically too many matches otherwise.
>
>3.“Dutch Special” search: matches are occasionally thrown off by spaces - particularly in Dutch/Italian/German etc. last names. e.g. “Van Den Berg” can also be “Vandenberg”. Spaces are stripped from the names before searching. Last name is required, first name is not. Exact search should be performed as otherwise too many matches.
>
>4. Nickname search: for each first name entered, a “nickname” / alternate name is substituted for the first name (from a table of alternate names) and the search performed e.g. “Charles” is entered and “Chuck” is also searched; both first name and last name are required; search is for exact matches only.
>
>5. Nee name search: search is performed with the last name searching the “Nee” field in the database instead of the last name field; last name is required but first name is not; search is for exact matches only.
>
>6. Phonetic “Fuzzy” search: search is performed using an add-in class (PhdBase) that matches names using a phonetic equivalent e.g. “Smith” will match with “Smyth”; user can search on either name i.e. both are not required; current search classes may provide this feature as standard.
>
>7. First 4 letter search: search is performed using just the first 4 letters of the first name and last name; over time it has been determined that if there are errors in spelling the name, it is usually towards the end of the name; by searching on just the first 4 letters, matches are often found; only performed if both last name and first name are required as otherwise too many matches.
>
>8. First name changed to an initial: search is done on the last name and the first name entered converted to an initial. This handles the case where the user enters the entire first name but the table only contains the initial e.g. user enters “Lyndon B. Johnson” and table contains “L. B. Johnson”. In our case, it is sometimes because the exact first name is not known when the file is opened. Note that the reverse problem is not common (i.e. user enters an initial and table contains full name) so that search is not done. Both last name and first name are required; search is for exact matches only.
>
>9. Initial in first name removed: if the user enters an initial as part of the first name, it is removed and the search is done with the remainder of the first name entered. This handles the case where the user enters “J. Edgar Hoover” and the table contains “Edgar Hoover”. Both last name and first name are required; if removing the leading initial results in the first name being blank, the search is skipped.
>
>10. First and last names switched: the last name and first name are switched and the search performed. This was a frequent problem particularly with Asian names as their surname is often shown first instead of the western way of showing the given name first. Both last name and first name are required; search is for exact matches only.
>
>11. Birthdate search: search is on birth date if entered. Though rare, users have sometimes found a match with the birthdate where the names are so poorly communicated to them over the phone that other methods have not found the match.
>
>12. Missing first name: search is on the last name without the first name where the table is missing the first name. This handles the situation where the file is opened without knowing the first name. The search criteria specifically as “AND EMPTY(Firstname)” in the search clause. This prevents a second file being opened by a user who then knows the first name. Only the last name is required to be entered (but in reality, usually the user is entering the first name). Search is for exact matches only.
>
>13. Preferred name search: search is on last name and the first name is searched against “Preferred name” field in the table. This field was recently introduced as some persons go by their “preferred” name whereas the table usually contains their legal name (e.g. “Gabby” for “Gabrielle”). Both last name and first name are required; search is for exact matches only.
>
>14. First name contained in first name field: search is done on the user entered first name where the first name is not the first word in the first name field e.g. user enters “Eric” and the first name field contains “James Eric”. Note that this is currently caught by the “phonetic” match above as it considers a record a match if a phonetically matched name occurs anywhere within the given field.
>
>That's all folks...Albert
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform