Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Stripping HTML keeping Text
Message
From
18/11/2010 08:28:46
 
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Miscellaneous
Thread ID:
01489477
Message ID:
01489557
Views:
49
Thank you, young fellow. I was working on a StrTran routine last night. You saved me lot of thinking time.

>>I'm pulling HTML off the internet and into a string then a memo field. I only want the text. How do I go about cleaning the HTML out of the string?
>
>Here's what I use (from wwUtils.prg):
>
>************************************************************************
>FUNCTION StripHTML
>*******************
>***  Function: Removes HTML tags from the passed text and converts
>***            it to plain text. Note formatting is totally removed!
>***    Assume: only <br> and <p> are translated
>***            any < or > in the HTML besides tags will break this
>***            function.
>***      Pass: lcText  -   HTML Text to strip
>***            lcLTag  -   Left Tag value ("<")
>***            lcRTag  -   Right Tag Value (">")
>***    Return: Stripped HTML text
>*************************************************************************
>LPARAMETER lcHTMLText, lcLTag, lcRTag
>
>lcLTag=IIF(EMPTY(lcLTag),"<",lcLTag)
>lcRTag=IIF(EMPTY(lcRTag),">",lcRTag)
>
>IF ATC(lcLTag,lcHTMLText) = 0
>   RETURN lcHTMLText
>ENDIF
>
>*** Start by breaking line breaks
>lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "BR" + lcRTag,CRLF)
>lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "P" + lcRTag,CRLF+CRLF)
>lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "br" + lcRTag,CRLF)
>lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "p" + lcRTag,CRLF+CRLF)
>lcHTMLText = STRTRAN(lcHTMLText,"&nbsp;"," ")
>
>lcExtract = "x"   
>DO WHILE !EMPTY(lcExtract)
>   lcExtract = STREXTRACT(lcHtmlText,lcLTag,lcRTag,1)
>   **Extract(lcHTMLText,lcLTag,lcRTag)
>			   
>   IF EMPTY(lcExtract)
>      EXIT
>   ENDIF
>   lcHTMLText = STRTRAN(lcHTMLText,lcLTag+lcExtract+lcRTag,"")
>ENDDO
>
>lcHTMLText = STRTRAN(lcHTMLText,"&lt;","<")
>lcHTMLText = STRTRAN(lcHTMLText,"&gt;",">")
>
>RETURN lcHTMLText
>
>
>It's not going to work on everything, but should work reasonably well to retrieve text. Like innerText it'll mangle elements that are bunched up to each other without spaces.
>
>+++ Rick ---
I ain't skeert of nuttin eh?
Yikes! What was that?
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform