>>>>Has anyone seen any code to strip all of the HTML tags to just leave plain text. It seems I have seen it before. I could easily write it, but why reinvent the wheel.
>>>>
>>>>Thanks in advance.
>>>
>>>You could pretty easily automate IE to do this. Just open the app and us the Document's InsideText property.
>>
>>Sorry. Open the file in IE, and use the Document object's InsideText property.
>
>
>Can't do. The html exists only in a variable in Foxpro. This is also at client sites (about 150 of them). They don't all have IE. I was hoping for some VFP code that could do it. Thanks.
I wanted to do some linguistic statistics, and I have a CD with 200 books on it, so I'm planning to extract just text. You could simply build a list of tags used within the text, something like this:
#define C_LT "<"
#define C_GT ">"
do while occurs(C_LT, hText)#0
lnFirstLeft=at(C_LT,htext)
lcTag=wordnum(subs(htext, lnFirstLeft+1), 1, C_LT+C_GT)
htext=strtran(htext, C_LT+lcTag+C_GT, "")
endd
In the end, hText should contain pure text with anything between < and > removed. Next thing, you should replace things like ampersand+"nbsp;" with space, etc etc.