Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
URL follow
Message
From
21/11/2010 17:47:35
 
 
To
21/11/2010 16:47:10
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Title:
Miscellaneous
Thread ID:
01489932
Message ID:
01490055
Views:
47
>>>>>>Is there VFP code available that will navigate to all the sub-links under a primary URL? And, perhaps even follow links outside the URL?
>>>>>
>>>>>Does this help:
>>>>>
>>>>>Crawler Download #9894
>>>>
>>>>
>>>>It's close :) Does the download come with source code?
>>>
>>>Seems you want a crawling/extracting robot - if you are working from a home line, I recommend going back to automating IE, as many of your problems are more than halfway eliminated by the already parsed state of the page through the IE DOM. For this specific UC you should iterate the links-collection, which can be handled as a zero based array from vfp. As I consturcted quite a few of those in IE4 times I know what a dev-time saver the object model can be. The limiting factor during runtime will probably always be the internet connection or the server if you start parallel processes on multicore machines, even if parsing/rendering in IE takes more time than dedicated code.
>>>
>>>regards
>>>
>>>thomas
>>
>>Thanks Thomas. Yes I am writing my first crawler. I am trying to find a way to get to the areas of news sites where they allow people to leave comments. I want to mine the comments for a snippet of the writings. The URL for the comments areas are assigned addtional information.
>>
>>Example: Primary URL http://world-news.newsvine.com/
>>Example: Secondary URL. http://world-news.newsvine.com/_news/2010/11/21/5502717-report-would-be-plane-bombers-post-attack-details#comments
>>
>>So I want to know if I can(with VFP and help from the UT) determine the added portion of the secondary URL.
>>
>>I could also use the crawler above to get a lot of data, but I also want to 'know' what code is used to acquire all the text from a website. I don't want the HTML etc. Just text. With Tore's help I am getting to the primary URL and extracting text snippets. I just need to drill down a bit more.
>
>Grady,
>
>to get the links, I guess you must check the innerhtml, not the innertext.

For those interested, here is a drill down stand alone routine I got from Calvin Hsai at MS

* Calvin Hsai's website drill down code
cTempFile=ADDBS(GETENV("TEMP"))+SYS(3)+".htm"
LOCAL oHTTP as "winhttp.winhttprequest.5.1"
LOCAL cHTML
oHTTP=NEWOBJECT("winhttp.winhttprequest.5.1")
oHTTP.Open("GET","http://blogs.msdn.com/calvin_hsia/archive/2004/06/28/168054.aspx",.f.)
oHTTP.Send()
STRTOFILE(ohTTP.ResponseText,cTempFile)
oIE=CREATEOBJECT("InternetExplorer.Application")
oIE.Visible=1
oIE.Navigate(cTempFile)
I ain't skeert of nuttin eh?
Yikes! What was that?
Previous
Reply
Map
View

Click here to load this message in the networking platform