Coding question - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Coding question

Message

22/10/2012 16:35:48

Rick Strahl
West Wind Technologies
Maui, Hawaii, États-Unis

19/10/2012 08:32:37

Grady McCue
Old fellow
The Grove, Alberta, Canada

Information générale

Forum:

Visual FoxPro

Catégorie:

Codage, syntaxe et commandes

Titre:

Re: Coding question

Divers

Thread ID:

01555334

Message ID:

01555490

Vues:

Grady,

As others have pointed out your code as is won't really work because if you Navigate() you have to wait for the page to complete before you can access the document or do anything useful with it. Basically you have to check ReadyState for 4 (or a timeout) to ensure the document has loaded.

I don't know what you're trying to do here, but to build a spider it's usually better to use plain HTTP requests (ala wwHttp or XmlHttp) to retrieve HTTP responses. This is much faster and more efficient (since there's no browser rendering involved) and it can be done synchronously one at a time. Parsing links out of a document is not terribly difficult and there are ActiveX and .NET libraries that can easily do that if you don't want to build it yourself (although a simple RegEx expression can do that just as easily).

+++ Rick ---

>With the help of some skilled VFP coders on the UT, I am having some fun with a web bot.
>I had success by splitting it into two parts. The first, using a homepage URL retrieved the sub URLs from the href HTML in the a home page
>The second part used the (saved in a table) URLs to navigate to the pages and retreive the text.
>
>Now I am having trouble merging the concept into one piece of code. Here's what's happening:
>
>I use this segment of code to get the href URLS
>

>o = CREATEOBJECT("InternetExplorer.Application")
>o.VISIBLE=.F.
>o.NAVIGATE(lcURL)
>FOR EACH loLink IN o.DOCUMENT.Links
>lcURL = [ ] + loLink.Href + [ ]
>
>o.NAVIGATE(lcURL) 
>* The error generated is OLE Error code 0x80070005: Access is denied.
>

>* I want to navigate to each lcURL as it is retreived, process the text and then navigate to the next href lcURL
>* until the website's href URLs have been processed, then move on to the next homepage and process its href URLs
>
>ENDFOR && below a lot of filtering code etc.
>
>Is there a solution for this or should I collect all the href URLs from all the sites into a table and then proceed with return visits to collect text?

+++ Rick ---

West Wind Technologies
Maui, Hawaii

west-wind.com/
West Wind Message Board
Rick's Web Log
Markdown Monster
---
Making waves on the Web

Where do you want to surf today?

Répondre

Fil

Voir

Click here to load this message in the networking platform