URL follow - Level Extreme

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

URL follow

Message

From

21/11/2010 17:47:35

Grady McCue
Old fellow
The Grove, Alberta, Canada

21/11/2010 16:47:10

Tore Bleken (Online)
Vear, Norway

General information

Forum:

Visual FoxPro

Category:

Coding, syntax & commands

Title:

Re: URL follow

Miscellaneous

Thread ID:

01489932

Message ID:

01490055

Views:

>>>>>>Is there VFP code available that will navigate to all the sub-links under a primary URL? And, perhaps even follow links outside the URL?
>>>>>
>>>>>Does this help:
>>>>>
>>>>>Crawler Download #9894
>>>>
>>>>
>>>>It's close :) Does the download come with source code?
>>>
>>>Seems you want a crawling/extracting robot - if you are working from a home line, I recommend going back to automating IE, as many of your problems are more than halfway eliminated by the already parsed state of the page through the IE DOM. For this specific UC you should iterate the links-collection, which can be handled as a zero based array from vfp. As I consturcted quite a few of those in IE4 times I know what a dev-time saver the object model can be. The limiting factor during runtime will probably always be the internet connection or the server if you start parallel processes on multicore machines, even if parsing/rendering in IE takes more time than dedicated code.
>>>
>>>regards
>>>
>>>thomas
>>
>>Thanks Thomas. Yes I am writing my first crawler. I am trying to find a way to get to the areas of news sites where they allow people to leave comments. I want to mine the comments for a snippet of the writings. The URL for the comments areas are assigned addtional information.
>>
>>Example: Primary URL http://world-news.newsvine.com/
>>Example: Secondary URL. http://world-news.newsvine.com/_news/2010/11/21/5502717-report-would-be-plane-bombers-post-attack-details#comments
>>
>>So I want to know if I can(with VFP and help from the UT) determine the added portion of the secondary URL.
>>
>>I could also use the crawler above to get a lot of data, but I also want to 'know' what code is used to acquire all the text from a website. I don't want the HTML etc. Just text. With Tore's help I am getting to the primary URL and extracting text snippets. I just need to drill down a bit more.
>
>Grady,
>
>to get the links, I guess you must check the innerhtml, not the innertext.

For those interested, here is a drill down stand alone routine I got from Calvin Hsai at MS

* Calvin Hsai's website drill down code
cTempFile=ADDBS(GETENV("TEMP"))+SYS(3)+".htm"
LOCAL oHTTP as "winhttp.winhttprequest.5.1"
LOCAL cHTML
oHTTP=NEWOBJECT("winhttp.winhttprequest.5.1")
oHTTP.Open("GET","http://blogs.msdn.com/calvin_hsia/archive/2004/06/28/168054.aspx",.f.)
oHTTP.Send()
STRTOFILE(ohTTP.ResponseText,cTempFile)
oIE=CREATEOBJECT("InternetExplorer.Application")
oIE.Visible=1
oIE.Navigate(cTempFile)

I ain't skeert of nuttin eh?
Yikes! What was that?

Map

View

Click here to load this message in the networking platform