Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Need fo find an algorithm to match peoples' identities
Message
De
18/08/2004 15:47:35
Charlie Schreiner
Myers and Stauffer Consulting
Topeka, Kansas, États-Unis
 
Information générale
Forum:
Visual FoxPro
Catégorie:
Codage, syntaxe et commandes
Divers
Thread ID:
00934152
Message ID:
00934238
Vues:
21
This message has been marked as a message which has helped to the initial question of the thread.
Hi Mariam,
Once you've standardized some based on Dan's answer, you can make a judgement as to how much a mismatch/match you can tolerate. If all the data was in VFP and you have indices on the fields of interest, code like the following is very fast. You can put the GetScore into the SQL statement, but I grabbed this example from an old DOS program, just to show the idea of adding up the matches to obtain a score.
* Determine an ID.
* Example:  ID = WhoAmI(SSN, BirthDate, Gender, FName, LName, Race, SuffName, MI, License) 
*---------------------------------------------------------------------------------
PARAMETERS SSN, BirthDate, Gender, FName, LName, ;
	Race, SuffName, MI, License
* Establish points for each match.
#DEFINE SSNMatch 30
#DEFINE LicenseNoMatch 30
#DEFINE GenderMatch 5
#DEFINE FNameMatch 4
#DEFINE LNameMatch 5
#DEFINE FNameSoundMatch 2
#DEFINE LNameSoundMatch 3
#DEFINE BirthMatch 6
#DEFINE RaceMatch 2
#DEFINE SuffNameMatch 1
#DEFINE MIMatch 2
* Possible 90 for a perfect match.
#DEFINE IDThreshold 35   && If less than this number
                          && say it's an unknown person.
* Determine the best score for a match.
SELECT ID, SSN, BirthDate, Gender, FName, LName, ;
   Race, SuffName, MI, GetScore() AS SCORE ;
FROM SomeTable ;
INTO CURSOR PossibleMatches ;
WHERE SSN = m.SSN ;
   OR BirthDate = m.BirthDate ;
   OR UPPER(LName + FName) = UPPER(m.LName + m.FName) ;
   ORDER BY Score DESC

IF PossibleMatches.Score >= SSNThreshold	&& Is this score high enough to assume a match?

PROCEDURE GetScore
LOCAL PointsScored

PointsScored = (IIF(Gender = m.Gender, GenderMatch, 0) ;
  + IIF(UPPER(FName) = UPPER(m.FName), FNameMatch, 0) ;
  + IIF(UPPER(LName) = UPPER(m.LName), LNameMatch, 0) ;
  + IIF(UPPER(FName) # UPPER(m.FName) AND SOUNDEX(FName) = SOUNDEX(m.FName), FNameSoundMatch, 0) ;
  + IIF(UPPER(LName) # UPPER(m.LName) AND SOUNDEX(LName) = SOUNDEX(m.LName), LNameSoundMatch, 0) ;
  + IIF(BirthDate = m.BirthDate, BirthMatch, 0) ;
  + IIF(Race = m.Race, RaceMatch, 0) ;
  + IIF(Suffix = m.SuffName, SuffNameMatch, 0) ;
  + IIF(MI = m.MI, MIMatch, 0)) ;
  + IIF(License = m.License, LICENSENOMATCH, 0)
RETURN m.PointsScored
>I tried to several keyword searches, but I cannot find what I am looking for. I am looking for an algorithm to match peoples' identities that are being reported from different sources. We will have: first name, last name, address, city, state, zip, a birthdate, a gender, and an ID. The information is being reported from numerous retail outlets. So there are variations on how the information will be reported. Elements such as names could be spelled differently. And some items may be erroneous or missing in specific instances. The ID may be that of the customer or that of the person picking up for the customer. ID consists of a government issued ID such as social security number, drivers license ID, passport number - so even if it is the same person in different records, the ID reported may be different.
>
>TIA,
>Mariam
Charlie
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform