Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Clean up Word HTML
Message
From
12/07/2006 15:35:01
Dragan Nedeljkovich (Online)
Now officially retired
Zrenjanin, Serbia
 
General information
Forum:
Visual FoxPro
Category:
Internet applications
Environment versions
Visual FoxPro:
VFP 9 SP1
OS:
Windows Server 2003
Database:
Visual FoxPro
Miscellaneous
Thread ID:
01135747
Message ID:
01135837
Views:
12
>Has anyone developed code to clean up what passes for HTML from a Word document? I have a few apps with users pasting documents from Word - either directly or indirectly - and it makes a mess and sometimes breaks my web application pages.

If I remember correctly, it was either Ted Roche or Rich Schummer at one of the conferences who demonstrated such a tool, but mostly in passing - the emphasis was on what was done with a cleaned-up document afterwards. I think it was on the last Whilfest in 2003. I've browsed through my conference downloads, but couldn't find it - so I'm not sure my memory is quite OK :).

Generally, you could parse the text for tags... and then untag the thing.
Here's my untag function:
Procedure untag(c)
Local lcTag
lcTag=Strextract(c, "<", ">",1,4)
lcCloseTag="</"+Getwordnum(lcTag, 1, "< ")+">"
Return Strextract(c, lcTag, lcCloseTag,1,1+2)
You can just chop the text into paragraphs (looking for pairs of matching P or H tags and anything between them), then work your way inside each paragraph, stripping some tags as you go.

back to same old

the first online autobiography, unfinished by design
What, me reckless? I'm full of recks!
Balkans, eh? Count them.
Previous
Reply
Map
View

Click here to load this message in the networking platform