Information générale
Catégorie:
Bases de données
Versions des environnements
Network:
Windows 2003 Server
>When using a search field for such a need, there is a risk to find an unrelated record.
>
>For example, I keep in the search field only the ASCII 65 to 90 as well as the space. This gives a pretty interesting conversion and I can see some useful things we can do with that. But, lets take for example some companies like this:
>
>A-1 AUTO BODY
>A-2 AUTO BODY
>A-3 AUTO BODY
>A-4 AUTO BODY
>
>...where all those company names are in fact unique. But, in the search field, after the conversion, they end up the same such as A AUTO BODY.
>
>So, when searching, there is no guarantee on which one it will fall.
>
>When using such an approach, are you validating that the search field has to be unique, assuming that the main field has to?
We are either looking for duplicates for data cleaning or matching entries on partially erroneous data.
So those 4 names would get high similarity in company name, but if no other criterion hints at a match
(might be phone#, adress, person to talk to or similar stuff) those 4 will be seen as different.
Problematic are things like
x 001 trust fund
x 002 trust fund
x 003 trust fund
living at the same adress with the same spokesperson : here we have to decide up front if a separate
corporate identity will create a singular entry or all will be bundled as one site (for mailing/contact scheduling for instance).
Hope that is clearer - if not, ask for specifics.
regards
thomas
Précédent
Suivant
Répondre
Voir le fil de ce thread
Voir le fil de ce thread à partir de ce message seulement
Voir tous les messages de ce thread
Voir tous les messages de ce thread à partir de ce message seulement