>>I once did that on dbfs, indexing for most of the words (omitted a few - articles, short lowercase words etc), on a fpt of about 190 megs, and the size of the index table was 59M dbf + 64M cdx, and it would give the result set blazingly fast. Then searching for exact phrase in the result set, using simple atc() on the phrase was easy and just as fast.
>
>There must be a tremendous percentage of words that occurs multi fold. A reduction from 190 to 59 is less than I'd expect. How come?
Not your ordinary language. It was all in legalese. And, btw, that was the word-to-text links table; the words table itself was about 6M. There were some additional fields in the links table, like the position of the first appearance of the word in text and maybe one more, used later in sorting results by relevance.