Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Pattern Matching...
Message
From
05/01/2001 14:01:08
 
 
To
05/01/2001 13:38:47
General information
Forum:
Visual FoxPro
Category:
Other
Miscellaneous
Thread ID:
00459940
Message ID:
00459963
Views:
42
>Hello all...
>I have two tables. One contains search criterion. The other contains the text to be searched. The problem is that the search criterion is not going to match exactly. The second table is built through human data entry. Thus I get all kinds of wierd stuff. AnyOne have any ideas or resources for a project of this nature?
>
>Thanks
>Tom Welch

Tom,

Try running a select with this function in the where clause. If it's a large table it's going to be slow as hell, so what I usually do is run a soundex first to get a general result set of matching criteria. Then I run this utility. You can adjust the accuracy of the search with the parameters
LPARAMETER tcString1, tcString2, tnMinMatch, tlExact, tlCaseSensitive

LOCAL lnStr1Len, lnStr2Len, lnMinMatch

IF PCOUNT() = 3
	lnMinMatch = tnMinMatch
ELSE
	lnMinMatch = 90
ENDIF

*-Remove all spaces and punctuation
tcString1 = STRTRAN(tcString1,' ','')
tcString2 = STRTRAN(tcString2,' ','')

tcString1 = STRTRAN(tcString1,'.','')
tcString2 = STRTRAN(tcString2,'.','')

tcString1 = STRTRAN(tcString1,',','')
tcString2 = STRTRAN(tcString2,',','')

tcString1 = STRTRAN(tcString1,'-','')
tcString2 = STRTRAN(tcString2,'-','')


*-Calculate length of each string
lnStr1Len = LEN(ALLT(tcString1))
lnStr2Len = LEN(ALLT(tcString2))

IF !tlCaseSensitive
	tcString1 = UPPER(tcString1)
	tcString2 = UPPER(tcString2)
ENDIF

*-Identify the shorter and larger strings
IF lnStr1Len <= lnStr2Len
	lcStringa = ALLT(tcString1)
	lcStringb = ALLT(tcString2)
	lnStrALen = lnStr1Len
	lnStrBLen = lnStr2Len
ELSE
	lcStringa = ALLT(tcString2)
	lcStringb = ALLT(tcString1)
	lnStrALen = lnStr2Len
	lnStrBLen = lnStr1Len
ENDIF

*-Compare shorter against longer string
*-Substitute every character match with space
lnSequence = 0
lnPosb = 0
FOR n = 1 TO lnStrALen
	*-Read a character from the shorter string
	lcChr = SUBSTR(lcStringa, N, 1)
	*-Save the position of the previous character in the Searched string
	lnOldPosb = lnPosb
	
	*-Same character may be found in different places in the same string.  So check if this
	*-character matches the sequence in Searched string (b) by reading the character following
	*-the previous one (nOldPosb). First make sure that lnOldPosb is not the last position
	*-to avoid a 'Cannot Access Character Beyond String' error message.
	IF lnOldPosb < lnStrbLen AND lcChr = SUBSTR(lcStringb,lnOldPosb + 1, 1)
		lnPosb = lnOldPosb + 1
	ELSE
		*-Get the first position of the current character in the searched string
		lnPosb= AT(lcChr, lcStringb)
	ENDIF
	
	*-If the current character is just after the previous character, increment sequence match by one.
	*-If the current character is somewhere after the previous character, increment sequence match by 0.20
	DO CASE
		CASE lnPosb - lnOldPosb = 1 
			lnSequence = lnSequence + 1
		CASE lnPosb - lnOldPosb > 1
			lnSequence = lnSequence + .20
	ENDCASE
	
	*-If character was found in searched string replace it with blank according 
	*-to its position (nPosB).  STUFF() must be used.
	*-The Search string (A) is read sequentially, so it'll always be found in the 
	*-first occurence.  STRTRAN() can be used.
	IF lnPosb > 0
		lcStringb = STUFF(lcStringb, lnPosB, 1, ' ')
		lcStringa = STRTRAN(lcStringa, lcChr, ' ', 1, 1)
	ENDIF
ENDFOR


*-Remove all spaces substituted in the character match process above
lcStringa = STRTRAN(lcStringa,' ','')
lcStringb = STRTRAN(lcStringb,' ','')

lnPercentSeq = lnSequence * 100 / lnStrALen

IF tlExact
	*-Get the character match in percent for string 1 and string 2.
	lnPercenta = 100 - (LEN(lcStringa) * 100 / lnStrALen)
	lnPercentb = 100 - (LEN(lcStringb) * 100 / lnStrBLen)

	*-Return the the average between the character match and the sequence order match
	RETURN ( MIN(lnPercentA , lnPercentB) + lnPercentSeq ) / 2 >= lnMinMatch
*	?MIN(nPercentA , nPercentB), npercenta,npercentb, cStringa, cStringb
ELSE
	*-Return the the average between the character match and the sequence order match
	lnPercenta = 100 - (LEN(lcStringa) * 100 / lnStrALen)
	*?nPercentA, cStringa, cStringb
	RETURN (lnPercenta + lnPercentSeq) / 2 >= lnMinMatch
ENDIF
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform