Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Stripping HTML keeping Text
Message
From
18/11/2010 10:21:15
 
 
To
18/11/2010 08:28:46
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Miscellaneous
Thread ID:
01489477
Message ID:
01489588
Views:
53
Perhaps adding
lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "Br" + lcRTag,CRLF)
lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "bR" + lcRTag,CRLF)
is in order - the "Br" happend to me in the wild when working off a similar routine ;-)

Today with mostly machine generated tags there is seldom need to postprocess
tags differing in capitalization only - but you may run into that.
If you do, keep the previous process intact, as it has better perf and clear most if not all...

been there done that...

regards

thomas

>Thank you, young fellow. I was working on a StrTran routine last night. You saved me lot of thinking time.
>
>>>I'm pulling HTML off the internet and into a string then a memo field. I only want the text. How do I go about cleaning the HTML out of the string?
>>
>>Here's what I use (from wwUtils.prg):
>>
>>************************************************************************
>>FUNCTION StripHTML
>>*******************
>>***  Function: Removes HTML tags from the passed text and converts
>>***            it to plain text. Note formatting is totally removed!
>>***    Assume: only <br> and <p> are translated
>>***            any < or > in the HTML besides tags will break this
>>***            function.
>>***      Pass: lcText  -   HTML Text to strip
>>***            lcLTag  -   Left Tag value ("<")
>>***            lcRTag  -   Right Tag Value (">")
>>***    Return: Stripped HTML text
>>*************************************************************************
>>LPARAMETER lcHTMLText, lcLTag, lcRTag
>>
>>lcLTag=IIF(EMPTY(lcLTag),"<",lcLTag)
>>lcRTag=IIF(EMPTY(lcRTag),">",lcRTag)
>>
>>IF ATC(lcLTag,lcHTMLText) = 0
>>   RETURN lcHTMLText
>>ENDIF
>>
>>*** Start by breaking line breaks
>>lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "BR" + lcRTag,CRLF)
>>lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "P" + lcRTag,CRLF+CRLF)
>>lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "br" + lcRTag,CRLF)
>>lcHTMLText = STRTRAN(lcHTMLText,lcLTag + "p" + lcRTag,CRLF+CRLF)
>>lcHTMLText = STRTRAN(lcHTMLText,"&nbsp;"," ")
>>
>>lcExtract = "x"   
>>DO WHILE !EMPTY(lcExtract)
>>   lcExtract = STREXTRACT(lcHtmlText,lcLTag,lcRTag,1)
>>   **Extract(lcHTMLText,lcLTag,lcRTag)
>>			   
>>   IF EMPTY(lcExtract)
>>      EXIT
>>   ENDIF
>>   lcHTMLText = STRTRAN(lcHTMLText,lcLTag+lcExtract+lcRTag,"")
>>ENDDO
>>
>>lcHTMLText = STRTRAN(lcHTMLText,"&lt;","<")
>>lcHTMLText = STRTRAN(lcHTMLText,"&gt;",">")
>>
>>RETURN lcHTMLText
>>
>>
>>It's not going to work on everything, but should work reasonably well to retrieve text. Like innerText it'll mangle elements that are bunched up to each other without spaces.
>>
>>+++ Rick ---
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform