Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Inaccurate name matching
Message
From
03/10/2001 03:32:52
 
 
To
02/10/2001 18:54:51
Josh Fedke
National Financial Corporation
Milwaukee, Wisconsin, United States
General information
Forum:
Visual FoxPro
Category:
Other
Miscellaneous
Thread ID:
00563349
Message ID:
00563453
Views:
36
Josh,

I've written a search tool with this functionality. I've used phdBase to implement fuzzy search (http://www.hallogram.com/phdbase/). Here are some phdBase score samples:
Score   Expression 
100	Smith          && search expression
63	Smitn
60	Smythe
56	Simth
50	Schmitt

100	Browse         && search expression
92	Borwse

100	1234576        && search expression
94	1234567
86	1324576
79	2143576
52	0221-0765721
Here's an extract from phdBase's help file:
ÖÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ·
º  phd: Controlling Fuzzy Search  º
ÓÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĽ
You can control PhDbase's fuzzy search in a variety of ways.  This control changes the results returned by both the LIKE search operator and the phd("LIKE...") function.

As part of its fuzzy search logic, PhDbase generates a "score" from 0-100 when comparing two words.  A score of 100 means they're a perfect match and a score of 0 means they don't match at all.  You can use the SCORE function to retrieve PhDbase's score, like this:

    ?phd("score mouse mouse")
            100
    ?phd("score moose mouse")
            86
    ?phd("score amuse mouse")
            50
    ?phd("score bullwinkle moose")
            0

For both the LIKE operator and function, PhDbase ignores words which fall below a certain threshold.  By default, this threshold is set to 55.  You can set the threshold to a different value like this:

    =phd("threshold 65")

This would result in "tighter" matches; in library parlance this is called increasing "precision" at the expense of "recall."

This statement will set the threshold back to the default value. It's preferable to use this form rather than "threshold 55":

    =phd("threshold default")

PhDbase uses a fuzzy search algorithm with "proximity" and "spot checking" components.  "Proximity" refers to the number of words in the index surrounding the word you're looking for in alphabetic order.  PhDbase always scans this number of words for close matches. The default proximity scan is 100 words in either direction.  You can change PhDbase's proximity scan like this:

    =phd("proximity 500")

The above statement will cause PhDbase to inspect 1000 words during its proximity scan.

In unusual situations, you can tell PhDbase to scan the entire index for hard-to-find words like this:

    =phd("proximity all")

We tried "proximity all" on the FoxPro help file and =phd("<>") found the word "rentalcost."

This statement will set the proximity scan back to its default value:

    =phd("proximity default")

PhDbase's spot checking heuristics allow words to be found which are phonetically similar to the target word, even though those words are in very different alphabetic positions; for example, "phonetics" and "foneticks."  In addition, PhDbase can find many common typographical errors such as "typo" vs. "ytpo."  PhDbase's "spot check" inspection scans 20 words around each heuristic by default.  You can change the value like this:

    =phd("spotcheck 50")

or you can set it back to the default like this:

    =phd("spotcheck default")

Note that PhDbase's spot checking is completely eliminated when "proximity all" is specified.

You can retrieve all of PhDbase's fuzzy search parameters without changing them like this:

    ?phd("threshold")
    ?phd("proximity")
    ?phd("spotcheck")
Hope this helps

>Hello All:
>
> I'm trying to write an application that takes a user-entered name and matches it with names in a table. If the name isn't found I wan't to try to do an inaccurate match and get a list of names that are 1 or 2 letters different. These are international names which easily have 7 separate words in a name. Does anyone have any ideas on how to do something like this? Thanks in advance.
>
> -Josh
Daniel
Previous
Reply
Map
View

Click here to load this message in the networking platform