Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Get words and spaces from a string
Message
From
21/02/2011 12:39:10
 
 
To
All
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Title:
Get words and spaces from a string
Miscellaneous
Thread ID:
01501103
Message ID:
01501103
Views:
187
I need to do "fuzzy searches on strings (multi word business names). to match similar names. I have been experimenting with the levenshtein algorithm and other matching algorithms. Levenshtein is too slow and only seems to work well with misspelled single words.

I think it will be helpful to evaluate how many words match in a given string, and ideally the order in which they match, e..g
regarding the string, HACKENSACK UNIVERSITY HOSPITAL, HACKENSACK UNIVERSITY would be a better match than
HACKENSACK HOSPITAL, if we consider that the first two words matching means more than the first and third. I think that USUALLY this would hold true for many business name searches. Ironically though in the example I just gave, the second string is more likely to be a match. Even so, I want to prioritize multi word matches by how many words match, with a priority given to the words matching in order.

Given this I wanted to be able to go beyond the getwordnum() function. I wanted to be able to get the word count, the length of each word, and the number of spaces between each word. If I don't have an exact match, for example, on a four word string, I want to first check for a match on the first three words, then the first two if that doesn't match, and so on. The only way I could figure out to do this was to develop the function below. It will enable me to concatenate the search string in the ways I need to.

I have tested this on a number of different strings... seems to be working. I'm sure though that someone has done a better job of this than me, or that there are libraries of string handling functions out there that are better than this. Note that I haven't handled other delimiters, nulls, and I'm sure a lot else.

Any feedback or help on this would be appreciated.
LPARAMETERS tc_string
SET STEP ON 
CREATE CURSOR cur_words (cwholestring char(250), cword char(100), ncount integer, nnumspaces integer, Nwordlen integer ) 
lc_string=''
lc_word=''
ln_space_count=0
ln_non_space_count=0
lc_spaces=''
ln_word_count=0
ln_chars_in_word=0
IF RIGHT(tc_string,1) <> CHR(32) && force the last character to be a space to simplify the logic and 
	tc_string=tc_string+ SPACE(1) 
endif
ln_string_len=LEN(tc_string)
FOR i = 1 TO ln_string_len
	lc_char=SUBSTR(tc_string,i,1)
	IF asc(lc_char) = 32
			IF ln_non_space_count<>0  && handle spaces at the beginning of the string
				ln_word_count=ln_word_count+1
				
				APPEND blank IN cur_words
				replace cwholestring with tc_string in cur_words
				replace cword WITH lc_word IN cur_words
				replace nwordlen WITH ln_chars_in_word IN cur_words
				ln_chars_in_word=0
				ln_non_space_count=0
				lc_word=''
			ENDIF
			lc_spaces=lc_spaces+" "	
			ln_space_count=ln_space_count+1
	ELSE
	
			IF ln_chars_in_word=0 && first character after a space
				APPEND blank IN cur_words
				replace cwholestring with tc_string in cur_words
				replace cword WITH '' IN cur_words
				replace nnumspaces WITH ln_space_count IN cur_words
				ln_space_count=0
			endif
			ln_chars_in_word=ln_chars_in_word+1
			lc_word=lc_word+lc_char		
			ln_non_space_count=ln_non_space_count+1
			
			
			
	endif

endfor	
Next
Reply
Map
View

Click here to load this message in the networking platform