Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Stripping HTML keeping Text
Message
 
To
17/11/2010 17:49:47
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Miscellaneous
Thread ID:
01489477
Message ID:
01489509
Views:
71
>I'm pulling HTML off the internet and into a string then a memo field. I only want the text. How do I go about cleaning the HTML out of the string?

Here's what I use (from wwUtils.prg):
************************************************************************
FUNCTION StripHTML
*******************
***  Function: Removes HTML tags from the passed text and converts
***            it to plain text. Note formatting is totally removed!
***    Assume: only <br> and <p> are translated
***            any < or > in the HTML besides tags will break this
***            function.
***      Pass: lcText  -   HTML Text to strip
***            lcLTag  -   Left Tag value ("<")
***            lcRTag  -   Right Tag Value (">")
***    Return: Stripped HTML text
*************************************************************************
LPARAMETER lcHTMLText, lcLTag, lcRTag

lcLTag=IIF(EMPTY(lcLTag),"<",lcLTag)
lcRTag=IIF(EMPTY(lcRTag),">",lcRTag)

IF ATC(lcLTag,lcHTMLText) = 0
   RETURN lcHTMLText
ENDIF

*** Start by breaking line breaks
lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "BR" + lcRTag,CRLF)
lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "P" + lcRTag,CRLF+CRLF)
lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "br" + lcRTag,CRLF)
lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "p" + lcRTag,CRLF+CRLF)
lcHTMLText = STRTRAN(lcHTMLText,"&nbsp;"," ")

lcExtract = "x"   
DO WHILE !EMPTY(lcExtract)
   lcExtract = STREXTRACT(lcHtmlText,lcLTag,lcRTag,1)
   **Extract(lcHTMLText,lcLTag,lcRTag)
			   
   IF EMPTY(lcExtract)
      EXIT
   ENDIF
   lcHTMLText = STRTRAN(lcHTMLText,lcLTag+lcExtract+lcRTag,"")
ENDDO

lcHTMLText = STRTRAN(lcHTMLText,"&lt;","<")
lcHTMLText = STRTRAN(lcHTMLText,"&gt;",">")

RETURN lcHTMLText
It's not going to work on everything, but should work reasonably well to retrieve text. Like innerText it'll mangle elements that are bunched up to each other without spaces.

+++ Rick ---
+++ Rick ---

West Wind Technologies
Maui, Hawaii

west-wind.com/
West Wind Message Board
Rick's Web Log
Markdown Monster
---
Making waves on the Web

Where do you want to surf today?
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform