I am nowhere near a perfect solution yet, but based on what little I have learned/observed from MS Indexing Service and PHDBase and MSSQL2K, here's my approach for VFP.
I have a "main" table with a unique row ID, this has my actual data. I then create a "keyword" table with a unique row ID, and a table with "noise words".
I scan the main table, and read through my column with actual content, and use the noise words table to strip out things that will throw off my searching - this includes things like singular "a", "the", "that", etc. I also convert the entire string to uppercase. I then insert the leftover string into the keyword table; I then strip off the first word, put it on the end of the string, and insert that again. There will be n entries in the keyword table for every single entry in the main table. So the tables would look something like this:
main.dbf
iRowID cText
1 Universal Thread Messages
2 Steve Craft
keywords.dbf
iRowID cText
1 Universal Thread Messages
1 Thread Messages Universal
1 Messages Universal Thread
2 Steve Craft
2 Craft Steve
I have an index on both columns in both tables.
If the search phrase is "universal messages", I take the first word, convert it to uppercase (to match the index expresion in the keywords.dbf), wash it through the noisewords table, and SQL-Select the top 500 rows into an array. I also build a "like" string for my final SQL statement (more on that in a minute). So in the end I will have n arrays (I have a limit of 10 keywords in this KWIC style search) of no more than 500 elements each. I then do the same for every other word.
I then create a simple 1-column cursor of iRowID, and append from each array into the cursor. I finally create a dynamic SQL string and use the "like" string I built earlier (eg. cWhereLike = cWhereLike + 'cText Like "%'+ cCurrWord+'%" and '), and again include a Top 500 clause and a Group By clause on the iRowID. In the end, I end up with a very small set of iRowID values that I can point back at the original main.dbf.
My thought is that Rushmore optimizes from left-to-right only, and does best with pulling small result sets from large ones. So in my first SQL queries where I am making arrays, I am reading from left-to-right and only keeping 500 rows max and I'll never have more than 5,000 total - in my tests I have always had much less. Rushmore won't optimize the cWhereLike clause in my final query, but because I have a very small amount of rows by that point it hasn't mattered so far.
My testing so far has included words in a char column, and not a memo column. The only difference between doing a char and a memo would really be the size of the keyword table on-disk, since the results of each query are limited as soon as possible. Wrapping the above methodology in a datetime() shows that everything is occuring in the same second, which is fast enough for me. :)
There's some more code in between what I wrote to make it a little smoother, but anyone who has $.02, please jump in.
>Hello world,
>
>How do you implement a Google-like search utility in a FoxPro application? I assume someone has figured this out already.