Clean up Word HTML - Level Extreme

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Clean up Word HTML

Message

From

12/07/2006 15:35:01

Dragan Nedeljkovich (Online)
Now officially retired
Zrenjanin, Serbia

12/07/2006 12:15:03

Michael Hogan
Ideate, LLC
Chicago, Illinois, United States

General information

Forum:

Visual FoxPro

Category:

Internet applications

Title:

Re: Clean up Word HTML

Environment versions

Visual FoxPro:

VFP 9 SP1

OS:

Windows Server 2003

Database:

Visual FoxPro

Miscellaneous

Thread ID:

01135747

Message ID:

01135837

Views:

>Has anyone developed code to clean up what passes for HTML from a Word document? I have a few apps with users pasting documents from Word - either directly or indirectly - and it makes a mess and sometimes breaks my web application pages.

If I remember correctly, it was either Ted Roche or Rich Schummer at one of the conferences who demonstrated such a tool, but mostly in passing - the emphasis was on what was done with a cleaned-up document afterwards. I think it was on the last Whilfest in 2003. I've browsed through my conference downloads, but couldn't find it - so I'm not sure my memory is quite OK :).

Generally, you could parse the text for tags... and then untag the thing.
Here's my untag function:

Procedure untag(c)
Local lcTag
lcTag=Strextract(c, "<", ">",1,4)
lcCloseTag="</"+Getwordnum(lcTag, 1, "< ")+">"
Return Strextract(c, lcTag, lcCloseTag,1,1+2)

You can just chop the text into paragraphs (looking for pairs of matching P or H tags and anything between them), then work your way inside each paragraph, stripping some tags as you go.

back to same old
the first online autobiography, unfinished by design
What, me reckless? I'm full of recks!
Balkans, eh? Count them.

Map

View

Click here to load this message in the networking platform