Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Get words and spaces from a string
Message
From
21/02/2011 16:25:10
 
 
To
21/02/2011 12:39:10
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Miscellaneous
Thread ID:
01501103
Message ID:
01501170
Views:
86
>I need to do "fuzzy searches on strings (multi word business names). to match similar names. I have been experimenting with the levenshtein algorithm and other matching algorithms. Levenshtein is too slow and only seems to work well with misspelled single words.
>
>I think it will be helpful to evaluate how many words match in a given string, and ideally the order in which they match, e..g
>regarding the string, HACKENSACK UNIVERSITY HOSPITAL, HACKENSACK UNIVERSITY would be a better match than
>HACKENSACK HOSPITAL, if we consider that the first two words matching means more than the first and third. I think that USUALLY this would hold true for many business name searches. Ironically though in the example I just gave, the second string is more likely to be a match. Even so, I want to prioritize multi word matches by how many words match, with a priority given to the words matching in order.
>
>Given this I wanted to be able to go beyond the getwordnum() function. I wanted to be able to get the word count, the length of each word, and the number of spaces between each word. If I don't have an exact match, for example, on a four word string, I want to first check for a match on the first three words, then the first two if that doesn't match, and so on. The only way I could figure out to do this was to develop the function below. It will enable me to concatenate the search string in the ways I need to.
>
>I have tested this on a number of different strings... seems to be working. I'm sure though that someone has done a better job of this than me, or that there are libraries of string handling functions out there that are better than this. Note that I haven't handled other delimiters, nulls, and I'm sure a lot else.
>
>Any feedback or help on this would be appreciated.
>
>
>
>
>
>LPARAMETERS tc_string
>SET STEP ON 
>CREATE CURSOR cur_words (cwholestring char(250), cword char(100), ncount integer, nnumspaces integer, Nwordlen integer ) 
>lc_string=''
>lc_word=''
>ln_space_count=0
>ln_non_space_count=0
>lc_spaces=''
>ln_word_count=0
>ln_chars_in_word=0
>IF RIGHT(tc_string,1) <> CHR(32) && force the last character to be a space to simplify the logic and 
>	tc_string=tc_string+ SPACE(1) 
>endif
>ln_string_len=LEN(tc_string)
>FOR i = 1 TO ln_string_len
>	lc_char=SUBSTR(tc_string,i,1)
>	IF asc(lc_char) = 32
>			IF ln_non_space_count<>0  && handle spaces at the beginning of the string
>				ln_word_count=ln_word_count+1
>				
>				APPEND blank IN cur_words
>				replace cwholestring with tc_string in cur_words
>				replace cword WITH lc_word IN cur_words
>				replace nwordlen WITH ln_chars_in_word IN cur_words
>				ln_chars_in_word=0
>				ln_non_space_count=0
>				lc_word=''
>			ENDIF
>			lc_spaces=lc_spaces+" "	
>			ln_space_count=ln_space_count+1
>	ELSE
>	
>			IF ln_chars_in_word=0 && first character after a space
>				APPEND blank IN cur_words
>				replace cwholestring with tc_string in cur_words
>				replace cword WITH '' IN cur_words
>				replace nnumspaces WITH ln_space_count IN cur_words
>				ln_space_count=0
>			endif
>			ln_chars_in_word=ln_chars_in_word+1
>			lc_word=lc_word+lc_char		
>			ln_non_space_count=ln_non_space_count+1
>			
>			
>			
>	endif
>
>endfor	
>
>
If your code is working, what improvements are you looking for?

If you're looking for different approaches here are a couple of things to look at:

- Text-search engines such as SQL Server full-text search
- If you want your searches to work "like Google" you could consider the Google Search Appliance: http://www.google.com/enterprise/search/gsa.html
Regards. Al

"Violence is the last refuge of the incompetent." -- Isaac Asimov
"Never let your sense of morals prevent you from doing what is right." -- Isaac Asimov

Neither a despot, nor a doormat, be

Every app wants to be a database app when it grows up
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform