Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Screen scraping from a web page
Message
Information générale
Forum:
Visual FoxPro
Catégorie:
Autre
Divers
Thread ID:
01273270
Message ID:
01273385
Vues:
23
>it's going to be for one person, programmer always present.
>
>
One need to investigate the structure of two initial pages, get their
elements, figure out what elements to click to get to the third page, and
then grab the information from there
>
>My question how does one go about the above that you mention?
>
>Thanks again.

Ok, here is an example of how to do it with the very first page:
pURL="http://www.sunbiz.org/jlilist.html"
READYSTATE_COMPLETE = 4
MAX_TIME =30

oIE = CreateObject("InternetExplorer.Application")
oIE.visible=.t.
oIE.Navigate(pURL)

lnStarted = SECONDS()
lnWaiting=MAX_TIME

do while oIE.Readystate <> READYSTATE_COMPLETE OR oie.busy
	lnWaiting = Seconds() - lnStarted
	IF  lnWaiting >= MAX_TIME
		EXIT
	ENDIF
ENDDO

if lnWaiting > MAX_TIME
	oIE=null
	=MESSAGEBOX("unable to connect, timeout...")
	return
ENDIF

SUSPEND	  && insert debtor name like HELLO to recognize the input element on the next step, resume
* 

DO iGetallelements WITH oie.Document.ALL

SUSPEND	 && analyze the cursor contents, look for "HELLO"$lcrec or "HELLO"$lchtml
* This way I was able to find the elements with the name "inquiry_value", 
*and "submit". Then I call two lines below to navigate to the second page 
*that displays the list with names staring with "B":

oie.Document.ALL.item("inquiry_value").value="b"
oie.Document.ALL.item("submit").click
*.....

return
*---------------------------
PROCEDURE iGetallelements
LPARAMETERS lo
IF TYPE("lo")#"O"
	lo=oie.Document.ALL
endif

CREATE CURSOR irepo (lcrec c(150), lchtml m, lctxt m)

WITH lo
	
  FOR ia=0 to .length-1
	m.lcrec=TRANSFORM(ia)+". "+;
		"TagName: "+.Item(ia).Tagname+;
 		IIF(TYPE(".Item(ia).type")="C" and not EMPTY(.Item(ia).type),;
	" Type: "+.Item(ia).type,"")+	;			
		IIF(TYPE(".Item(ia).classname")="C" and not EMPTY(.Item(ia).classname),;
       " CLassname: "+.Item(ia).classname,"")+	;
		IIF(TYPE(".Item(ia).name")="C" and not EMPTY(.Item(ia).name),;
	"  Name: "+ .Item(ia).name,"")+;
	IIF(TYPE(".Item(ia).value")#"U" and not EMPTY(TRANSFORM(.Item(ia).value)),;
	" Value: "+TRANSFORM(.Item(ia).value),"")+;
	IIF(TYPE(".Item(ia).title")="C" and not EMPTY(.Item(ia).title),;
	     " Title: "+.Item(ia).title,"")+;
	IIF(TYPE(".Item(ia).src")="C" and not EMPTY(.Item(ia).src),;
		" Src: "+.Item(ia).src,"")

	m.lchtml=IIF(TYPE(".Item(ia).innerhtml")="C" and not EMPTY(.Item(ia).innerhtml),;
	.Item(ia).innerhtml,"")
	m.lctxt=IIF(TYPE(".Item(ia).innertext")="C" and not EMPTY(.Item(ia).innertext),;
	.Item(ia).innertext,"")

	INSERT INTO irepo FROM memvar
		
  ENDFOR
ENDWITH
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform