Pattern Matching... - Level Extreme

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Pattern Matching...

Message

From

05/01/2001 14:01:08

James Weil
Flushing, New York, United States

05/01/2001 13:38:47

Tom Welch
Big Champagne LLC
Atlanta, Georgia, United States

General information

Forum:

Visual FoxPro

Category:

Other

Title:

Re: Pattern Matching...

Miscellaneous

Thread ID:

00459940

Message ID:

00459963

Views:

>Hello all...
>I have two tables. One contains search criterion. The other contains the text to be searched. The problem is that the search criterion is not going to match exactly. The second table is built through human data entry. Thus I get all kinds of wierd stuff. AnyOne have any ideas or resources for a project of this nature?
>
>Thanks
>Tom Welch

Tom,

Try running a select with this function in the where clause. If it's a large table it's going to be slow as hell, so what I usually do is run a soundex first to get a general result set of matching criteria. Then I run this utility. You can adjust the accuracy of the search with the parameters

LPARAMETER tcString1, tcString2, tnMinMatch, tlExact, tlCaseSensitive

LOCAL lnStr1Len, lnStr2Len, lnMinMatch

IF PCOUNT() = 3
	lnMinMatch = tnMinMatch
ELSE
	lnMinMatch = 90
ENDIF

*-Remove all spaces and punctuation
tcString1 = STRTRAN(tcString1,' ','')
tcString2 = STRTRAN(tcString2,' ','')

tcString1 = STRTRAN(tcString1,'.','')
tcString2 = STRTRAN(tcString2,'.','')

tcString1 = STRTRAN(tcString1,',','')
tcString2 = STRTRAN(tcString2,',','')

tcString1 = STRTRAN(tcString1,'-','')
tcString2 = STRTRAN(tcString2,'-','')


*-Calculate length of each string
lnStr1Len = LEN(ALLT(tcString1))
lnStr2Len = LEN(ALLT(tcString2))

IF !tlCaseSensitive
	tcString1 = UPPER(tcString1)
	tcString2 = UPPER(tcString2)
ENDIF

*-Identify the shorter and larger strings
IF lnStr1Len <= lnStr2Len
	lcStringa = ALLT(tcString1)
	lcStringb = ALLT(tcString2)
	lnStrALen = lnStr1Len
	lnStrBLen = lnStr2Len
ELSE
	lcStringa = ALLT(tcString2)
	lcStringb = ALLT(tcString1)
	lnStrALen = lnStr2Len
	lnStrBLen = lnStr1Len
ENDIF

*-Compare shorter against longer string
*-Substitute every character match with space
lnSequence = 0
lnPosb = 0
FOR n = 1 TO lnStrALen
	*-Read a character from the shorter string
	lcChr = SUBSTR(lcStringa, N, 1)
	*-Save the position of the previous character in the Searched string
	lnOldPosb = lnPosb
	
	*-Same character may be found in different places in the same string.  So check if this
	*-character matches the sequence in Searched string (b) by reading the character following
	*-the previous one (nOldPosb). First make sure that lnOldPosb is not the last position
	*-to avoid a 'Cannot Access Character Beyond String' error message.
	IF lnOldPosb < lnStrbLen AND lcChr = SUBSTR(lcStringb,lnOldPosb + 1, 1)
		lnPosb = lnOldPosb + 1
	ELSE
		*-Get the first position of the current character in the searched string
		lnPosb= AT(lcChr, lcStringb)
	ENDIF
	
	*-If the current character is just after the previous character, increment sequence match by one.
	*-If the current character is somewhere after the previous character, increment sequence match by 0.20
	DO CASE
		CASE lnPosb - lnOldPosb = 1 
			lnSequence = lnSequence + 1
		CASE lnPosb - lnOldPosb > 1
			lnSequence = lnSequence + .20
	ENDCASE
	
	*-If character was found in searched string replace it with blank according 
	*-to its position (nPosB).  STUFF() must be used.
	*-The Search string (A) is read sequentially, so it'll always be found in the 
	*-first occurence.  STRTRAN() can be used.
	IF lnPosb > 0
		lcStringb = STUFF(lcStringb, lnPosB, 1, ' ')
		lcStringa = STRTRAN(lcStringa, lcChr, ' ', 1, 1)
	ENDIF
ENDFOR


*-Remove all spaces substituted in the character match process above
lcStringa = STRTRAN(lcStringa,' ','')
lcStringb = STRTRAN(lcStringb,' ','')

lnPercentSeq = lnSequence * 100 / lnStrALen

IF tlExact
	*-Get the character match in percent for string 1 and string 2.
	lnPercenta = 100 - (LEN(lcStringa) * 100 / lnStrALen)
	lnPercentb = 100 - (LEN(lcStringb) * 100 / lnStrBLen)

	*-Return the the average between the character match and the sequence order match
	RETURN ( MIN(lnPercentA , lnPercentB) + lnPercentSeq ) / 2 >= lnMinMatch
*	?MIN(nPercentA , nPercentB), npercenta,npercentb, cStringa, cStringb
ELSE
	*-Return the the average between the character match and the sequence order match
	lnPercenta = 100 - (LEN(lcStringa) * 100 / lnStrALen)
	*?nPercentA, cStringa, cStringb
	RETURN (lnPercenta + lnPercentSeq) / 2 >= lnMinMatch
ENDIF

Map

View

Click here to load this message in the networking platform