>>You might want to look at an Html Parser like the one in HtmlAgilityPack. This will be more reliable as it can destructure the HTML into a DOM and then return the innerText. HtmlAgility can be pretty fast for lots of thing, but more importantly it generally will do a better job at pulling out text that is properly formatted, separating elements etc.
>>
>>It's not easy to get that logic right.
>
>Thanks, that is another interesting approach.
>
>I am not sure however if it can perform as fast as a simple RegEx() command.
But it will probably be more reliable. Consider the following HTML:
< form onsubmit="return NumberOfValidEmailAddresses() > 0;"> < /form>
The RegEx will fail because it finds the > in the onsubmit. A Html parser will see that as an attribute and handle it properly. If you want a RegEx that properly handles finding tags, it will be much more complex. See
http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx/