Get web page text - Level Extreme

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Get web page text

Message

From

10/06/2010 13:12:16

Thomas Ganss
Main Trend
Frankfurt, Germany

10/06/2010 11:19:07

Jos Pols
C., South Africa

General information

Forum:

Visual FoxPro

Category:

Coding, syntax & commands

Title:

Re: Get web page text

Environment versions

Visual FoxPro:

VFP 9 SP2

Miscellaneous

Thread ID:

01468306

Message ID:

01468340

Views:

>Hi All,
>
>Given a web page which may contain various parts is there a way to extract the text from what is most likley the pages main purpose? For example take this this page: http://news.bbc.co.uk/1/hi/business/10281079.stm . It contains an article on the oil spill but is surrounded with various other stuff, lnks, adverts, etc. Is there a technique for getting the main articles text. I understand this may not always be exact but I'm looking for a "good enough" solution.

Many years ago [aka less advertising used and much simpler markup]
I used to automate IE as a spider. Even back then a fully automatic way was not possible,
but I built a TV of the HTML with some additional info (name and Text len, current level
and filtering possibilities - like show only links to get hints where to go next).

That way I could extract rules to implement for each site, which were just some memo fields...
As sites changed subtly about 4 times per year, having a fast way to alter my scripts
involved helping me "read" - going at it fully automated was beyond my scope.

my 0.22 EUR

thomas

Map

View

Click here to load this message in the networking platform