Hi All,
Given a web page which may contain various parts is there a way to extract the text from what is most likley the pages main purpose? For example take this this page:
http://news.bbc.co.uk/1/hi/business/10281079.stm . It contains an article on the oil spill but is surrounded with various other stuff, lnks, adverts, etc. Is there a technique for getting the main articles text. I understand this may not always be exact but I'm looking for a "good enough" solution.
Thanks
Jos
In the End, we will remember not the words of our enemies, but the silence of our friends - Martin Luther King, Jr.