Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Removing tags from a long string
Message
 
À
19/01/2016 09:20:57
Information générale
Forum:
ASP.NET
Catégorie:
Code, syntaxe and commandes
Versions des environnements
Environment:
VB 9.0
OS:
Windows 8.1
Network:
Windows 2008 Server
Database:
MS SQL Server
Application:
Web
Divers
Thread ID:
01629858
Message ID:
01629954
Vues:
42
>>You might want to look at an Html Parser like the one in HtmlAgilityPack. This will be more reliable as it can destructure the HTML into a DOM and then return the innerText. HtmlAgility can be pretty fast for lots of thing, but more importantly it generally will do a better job at pulling out text that is properly formatted, separating elements etc.
>>
>>It's not easy to get that logic right.
>
>Thanks, that is another interesting approach.
>
>I am not sure however if it can perform as fast as a simple RegEx() command.

You'd be surprised. An optimized DOM parser that you can feed small snippets to will be very fast and as I said much more reliable in producing valid text. There are a lot of things you need to worry about with parsing HTML strings - HTML encoding, characters that might interfere with your RegEx parse chars etc.

I've been down this road a few times, and let me tell you there are lots of edge cases and it gets much worse when you deal with nested elements...

Before you re-invent the wheel look at HtmlAgility pack - it'll take you 15 minutes to determine how well it performs but will be well worth your time if it works (and you probably find other uses for it).

+++ Rick ---
+++ Rick ---

West Wind Technologies
Maui, Hawaii

west-wind.com/
West Wind Message Board
Rick's Web Log
Markdown Monster
---
Making waves on the Web

Where do you want to surf today?
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform