Parsing XML with DOM - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Parsing XML with DOM

Message

20/11/2004 18:01:25

Don Freeman
Pag
Tucson, Arizona, États-Unis

20/11/2004 02:59:39

Martina Jindrová
Eg Expert, S.R.O.
Trutnov, République Tchèque

Information générale

Forum:

Visual FoxPro

Catégorie:

XML, XSD

Titre:

Re: Parsing XML with DOM

Versions des environnements

Visual FoxPro:

VFP 7

Divers

Thread ID:

00962465

Message ID:

00963246

Vues:

Martin -

Thanks for your reply. Actually the xml file is very large (10 meg) and the sample I provided was just to illustrate some of the form. The actual headers in the file are :

<?xml version='1.0' encoding='utf-8'?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV='http://schemas.xmlsoap.org/soap/envelope/' xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:SOAP-ENC='http://schemas.xmlsoap.org/soap/encoding/'>

<SOAP-ENV:Body>
<ns1:downloadResponse
SOAP-ENV:encodingStyle='http://schemas.xmlsoap.org/soap/encoding/'
xmlns:ns1='urn:TMSWebServices'>
<xtvdResponse xsi:type='ns1:xtvdResponse'>

some of the sections look like this:

<stations>
<station id='10021'>
<callSign>AMC</callSign>
<name>AMC</name>
<affiliate>Satellite</affiliate>
</station>
<station id='16331'>
<callSign>ANIMAL</callSign>
<name>Animal Planet</name>
<affiliate>Satellite</affiliate>
</station>
</stations>
<schedules>
<schedule program='EP1151270200' station='11867' time='2004-11-17T01:00:00Z' duration='PT00H30M' tvRating='TV-PG' stereo='true' closeCaptioned='true'/>
<schedule program='EP1151270201' station='11867' time='2004-11-17T01:30:00Z' duration='PT00H30M' tvRating='TV-PG' stereo='true' closeCaptioned='true'/>
<schedule program='EP2654380045' station='11867' time='2004-11-17T02:00:00Z' duration='PT00H30M' tvRating='TV-14' stereo='true' closeCaptioned='true'/>
<schedule program='EP2654380046' station='11867' time='2004-11-17T02:30:00Z' duration='PT00H35M' tvRating='TV-14' stereo='true' closeCaptioned='true'/>
<schedule program='EP6892960005' station='11867' time='2004-11-17T03:05:00Z' duration='PT01H00M'/>
<schedule program='EP4638260091' station='12131' time='2004-11-17T01:30:00Z' duration='PT00H30M' tvRating='TV-Y7' stereo='true' closeCaptioned='true'>
<part number='1' total='2'/>
</schedule>
<schedule program='MV1032330000' station='11867' time='2004-11-17T04:05:00Z' duration='PT01H45M' tvRating='TV-14' closeCaptioned='true'/>
<schedule program='MV1032330000' station='11867' time='2004-11-17T05:50:00Z' duration='PT01H45M' tvRating='TV-14' closeCaptioned='true'/>
<schedule program='MV0280340000' station='11867' time='2004-11-17T07:35:00Z' duration='PT02H00M' tvRating='TV-PG' closeCaptioned='true'/>
<schedule program='SH2148780000' station='11867' time='2004-11-17T09:35:00Z' duration='PT00H25M'/>
<schedule program='EP1282610060' station='11867' time='2004-11-17T10:00:00Z' duration='PT00H30M' tvRating='TV-PG' stereo='true' closeCaptioned='true'/>
<schedule program='EP1900270045' station='11867' time='2004-11-17T10:30:00Z' duration='PT00H30M' tvRating='TV-PG' stereo='true' closeCaptioned='true'/>
</schedules>
<programs>
<program id='MV0008290000'>
<title>Rooster Cogburn</title>
<description>One-eyed Marshal Cogburn (John Wayne) helps a Bible-toting spinster (Katharine Hepburn) find the men who killed her preacher father.</description>
<mpaaRating>PG</mpaaRating>
<starRating>**</starRating>
<runTime>PT01H47M</runTime>
<year>1975</year>
<series></series>
<advisories>
<advisory>Adult Situations</advisory>
<advisory>Violence</advisory>
</advisories>
</program>
</programs>

There are others but these pretty well illustrate what I have. So the first question is : since this is UTF-8 and not windows coding, can I write a schema that will work with it? What would be the proper header for the schema/dtd?

Also note that the file is inconsistant in that some of the schedule elements have a endtag while others do not. This is why I have been unable to "word process" it into a form that xmltocursor will read.
I have tried:

mystr = filetostr('dddata.raw')
mystr = "<tv"> + strextract(mystr,"<schedules>","</schedules>") + "</tv?"
= xmltocursor(mystr,'tempxml')

but I get a parse error presumably because of the bad format of the source. I think the cause of the inconsistant format is due to the length of the line which seems to split when it gets too many data fields.

My next question is: Can the schema/dtd be developed incrementally to extract only one or two data sections or does it need to be written for the entire file right off the bat? It would sure be a lot easier to work with it one section at a time.

My thanks to you and all the others who have chimed in on this thread. As you can see, this is a totally new area for me.

- Don

Répondre

Fil

Voir

Click here to load this message in the networking platform