Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
A little crawler prob
Message
From
23/11/2010 13:18:09
 
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Miscellaneous
Thread ID:
01490197
Message ID:
01490271
Views:
54
Right, and Release lcHTML before that.

>Add
>
>lox.Quit()
>lox = NULL
>
>>
>>
>>******************************************
>>* http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds
>>* http://evolvingtrends.wordpress.com/2006/07/09/e-society-p2p-application/
>>* Gather information from blogs, comment sites, forums etc
>>* Process and determine what the group's postings suggest
>>* to process abd determine near future events that MAY occur.
>>* Maybe this should be put on codeplex for any UT person to 
>>* contribute to the project.
>>******************************************
>>* NetScrape
>>* KUDOS to Tore who supplied the original code
>>SET DEFAULT TO C:\scraper
>>CLOSE TABLES
>>USE srchwrds IN 1 && A table of words that will determins what snippet to collect
>>USE URLs IN 2 && A table of URLs of blogs, comment sites, forums etc
>>USE results IN 3 && Stores the snippets (about 1000 characters or so)
>>SELECT srchwrds
>>COUNT TO SearchWord
>>SELECT URLs
>>COUNT TO URLcount
>>FOR Num = 1 TO URLcount
>>	GOTO Num
>>	GotUrl = location
>>	GotUrl = ALLTRIM(GotUrl)
>>	lcURL = TRANSFORM(GotUrl)
>>	lnTime=1200 && 12 seconds
>>	lox=CREATEOBJECT('internetexplorer.application')
>>	lox.VISIBLE=.F.
>>	lox.NAVIGATE(lcURL)
>>	lox.VISIBLE=.F.
>>	WAIT 'Navigating!' WINDOW NOWAIT
>>	IF !NavComplete(lox,lnTime)
>>		??CHR(7)
>>		WAIT 'Timeout fail!' WINDOW
>>		lox.VISIBLE=.T.
>>		RELEASE lox
>>		RETURN
>>	ENDIF
>>	WAIT CLEAR
>>	lcHTML=lox.DOCUMENT.body.outertext && or innertext to ignore URLs
>>* Find a string inside a string
>>	SELECT srchwrds
>>	FOR WordNum = 1 TO SearchWord
>>		GOTO WordNum
>>		Searching = ALLTRIM(words)
>>		Searching = TRANSFORM(Searching)
>>		SaveMe = ""
>>		SaveMe = SUBSTR(lcHTML,ATC(Searching,lcHTML)-300,450) && Get words on either side of the search word.
>>*		? SaveMe  && For testing
>>		IF LEN(ALLTRIM(SaveMe)) > 1
>>			SELECT results
>>			APPEND BLANK
>>			REPLACE captured WITH DATE()
>>			REPLACE usedkey WITH srchwrds.words
>>			REPLACE stored WITH SaveMe
>>		ENDIF
>>		SaveMe = ""
>>		SELECT srchwrds
>>	NEXT WordNum
>>	SELECT URLs
>>	RELEASE lox
>>NEXT Num
>>*****************************************************************************
>>FUNCTION NavComplete
>>LPARA toIE, tnTimeout
>>lnTimeout=IIF( TYPE("tnTimeout")="N",tnTimeout ,60 )
>>lnTimeElapsed=0
>>lnStartSeconds=SECONDS()
>>DO WHILE .T.
>>	IF toIE.ReadyState=4 AND !toIE.Busy
>>		DO WHILE .T.
>>			IF toIE.DOCUMENT.ReadyState="complete"
>>				RETURN .T.
>>			ENDIF
>>			IF (SECONDS()-lnStartSeconds)>lnTimeout
>>				RETURN .F.
>>			ENDIF
>>		ENDDO
>>	ENDIF
>>	IF (SECONDS()-lnStartSeconds)>lnTimeout
>>		RETURN .F.
>>	ENDIF
>>ENDDO
>>ENDFUNC
>>
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform