Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Removing tags from a long string
Message
 
To
19/01/2016 09:20:57
General information
Forum:
ASP.NET
Category:
Coding, syntax and commands
Environment versions
Environment:
VB 9.0
OS:
Windows 8.1
Network:
Windows 2008 Server
Database:
MS SQL Server
Application:
Web
Miscellaneous
Thread ID:
01629858
Message ID:
01629954
Views:
41
>>You might want to look at an Html Parser like the one in HtmlAgilityPack. This will be more reliable as it can destructure the HTML into a DOM and then return the innerText. HtmlAgility can be pretty fast for lots of thing, but more importantly it generally will do a better job at pulling out text that is properly formatted, separating elements etc.
>>
>>It's not easy to get that logic right.
>
>Thanks, that is another interesting approach.
>
>I am not sure however if it can perform as fast as a simple RegEx() command.

You'd be surprised. An optimized DOM parser that you can feed small snippets to will be very fast and as I said much more reliable in producing valid text. There are a lot of things you need to worry about with parsing HTML strings - HTML encoding, characters that might interfere with your RegEx parse chars etc.

I've been down this road a few times, and let me tell you there are lots of edge cases and it gets much worse when you deal with nested elements...

Before you re-invent the wheel look at HtmlAgility pack - it'll take you 15 minutes to determine how well it performs but will be well worth your time if it works (and you probably find other uses for it).

+++ Rick ---
+++ Rick ---

West Wind Technologies
Maui, Hawaii

west-wind.com/
West Wind Message Board
Rick's Web Log
Markdown Monster
---
Making waves on the Web

Where do you want to surf today?
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform