Get words and spaces from a string

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Get words and spaces from a string

Message

From

24/02/2011 09:42:51

David Schlesinger
No Company
Elmwood Park, New Jersey, United States

21/02/2011 20:44:11

Al Doman
M3 Enterprises Inc.
North Vancouver, British Columbia, Canada

General information

Forum:

Visual FoxPro

Category:

Coding, syntax & commands

Title:

Re: Get words and spaces from a string

Miscellaneous

Thread ID:

01501103

Message ID:

01501622

Views:

Thanks Al.

Regarding an incremental search - I will add this to the interface. This will be the "search of last resort". I want the system to try to find as many matches in advance as possible. If there are no initial results, or initial results that aren't suitable, then they can do the incremental search. I think the incremental search should be a like or contains search, or let them choose = or contains.

What you have suggested below is what I am mostly doing - but still tweaking it. Exact matches are handled automatically. "multi-word matches" (phrases) are also searched for, and I do them first, with the assumption that they are better matches than "includes the word or words" This is what my word parsing function helped with. I still need to work on this though. Say for example that the search string has four words - I first search on data that contains the four word phrase, if found, I exit and don't search any further, if not I search on the first three words in the phrase, then the first two, etc. However, I think it still needs more work. For example, I don't do all phrase combinations - like searching for words 2,3 and 4, or 2 and 3. And I don't yet filter out duplicate results, or similar results, as you suggest below.

Ideally, also, if the once phrases and words are evaluated, a lower priority search would be to find all permutations of words or phrases with similar or misspelled words. However, the levenshtein algorithm I used for this seems way too slow to use in a "real time" application. I am guessing that the next step to improve this would be to have an indexed dictionary of words and common misspellings / variations. Any feedback on this piece would be appreciated as well.

Regarding regular expressions, I used them years ago when I did a little bit of unix work, worked with Grep, but my knowledge is very limited - I didn't know that there was a RegEx to VFP integration - I will check it out.

>
>Thanks for the clarification. I don't have any code samples that could help you on your current path. However I have encountered a couple of alternate approaches. Both of them involve feedback from the user.
>
>1. (specialized) Simple incremental search. If you assume the users always know the start, or first few letters of the business name an incremental search with a grid, listbox or combo can work well. Fifteen hundred businesses is not a lot, on modern hardware incremental search can be basically instantaneous. Once users are accustomed to how it works they can zero in on the one they want very quickly.
>
>2. (more general) Display detailed search results and let the user choose. Suppose the user searches on "Hackensack University". You might be able to present results like this:
>

>Your search string: [Hackensack University]
>
>Exact Matches: 1
>  Hackensack University
>
>Includes phrase [Hackensack University]: 2
>  Hackensack University
>  Hackensack University Hospital
>
>Includes 2 words [Hackensack][University]: 4
>  Hackensack University
>  Hackensack University Hospital
>  University of Hackensack
>  University of Hackensack School of Medicine
>
>Includes 1 word [Hackensack]: xxx
>  ...
>
>Includes 1 word [University]: yyy
>  ...
>

It would probably be a good idea to filter each "section" above so that results already appearing in a "higher" section don't appear again in the lower ones.
>
>Another way to present the output of the second approach is to assign a numeric "relevance" score to each of the results, then order descending. The simple example above, if filtered might look something like this:
>

>Your search string: [Hackensack University]
>
>100: Hackensack University
> 90: Hackensack University Hospital
> 50: University of Hackensack
> 50: University of Hackensack School of Medicine
> 25: xxx entries that contain [Hackensack]
> 25: yyy entries that contain [University]
>

An option for the above would be to give the user a field that says "Only show results with zz% or higher relevance".
>
>Yet another idea is to look into regular expressions (RegEx). I don't know how to use them (I consider that a shortcoming in my development skills) but I understand they are very powerful for text parsing, and I believe the libraries are highly optimized on most OSs. RegEx is callable/available from VFP, ISTR some threads here from time to time showing how to use them. There are also some results on the Fox Wiki e.g. http://fox.wikis.com/wc.dll?Wiki~RegExp~VFP .

Map

View

Click here to load this message in the networking platform