Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
URL follow
Message
De
21/11/2010 10:10:46
 
 
À
21/11/2010 03:35:37
Information générale
Forum:
Visual FoxPro
Catégorie:
Codage, syntaxe et commandes
Titre:
Divers
Thread ID:
01489932
Message ID:
01490008
Vues:
47
>>>>Is there VFP code available that will navigate to all the sub-links under a primary URL? And, perhaps even follow links outside the URL?
>>>
>>>Does this help:
>>>
>>>Crawler Download #9894
>>
>>
>>It's close :) Does the download come with source code?
>
>Seems you want a crawling/extracting robot - if you are working from a home line, I recommend going back to automating IE, as many of your problems are more than halfway eliminated by the already parsed state of the page through the IE DOM. For this specific UC you should iterate the links-collection, which can be handled as a zero based array from vfp. As I consturcted quite a few of those in IE4 times I know what a dev-time saver the object model can be. The limiting factor during runtime will probably always be the internet connection or the server if you start parallel processes on multicore machines, even if parsing/rendering in IE takes more time than dedicated code.
>
>regards
>
>thomas

Thanks Thomas. Yes I am writing my first crawler. I am trying to find a way to get to the areas of news sites where they allow people to leave comments. I want to mine the comments for a snippet of the writings. The URL for the comments areas are assigned addtional information.

Example: Primary URL http://world-news.newsvine.com/
Example: Secondary URL. http://world-news.newsvine.com/_news/2010/11/21/5502717-report-would-be-plane-bombers-post-attack-details#comments

So I want to know if I can(with VFP and help from the UT) determine the added portion of the secondary URL.

I could also use the crawler above to get a lot of data, but I also want to 'know' what code is used to acquire all the text from a website. I don't want the HTML etc. Just text. With Tore's help I am getting to the primary URL and extracting text snippets. I just need to drill down a bit more.
I ain't skeert of nuttin eh?
Yikes! What was that?
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform