Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Side by side comparison (strings & local data)
Message
 
À
24/12/2003 07:11:44
Cetin Basoz
Engineerica Inc.
Izmir, Turquie
Information générale
Forum:
Visual FoxPro
Catégorie:
Visual FoxPro et .NET
Divers
Thread ID:
00861648
Message ID:
00861947
Vues:
51
Hi Cetin,

Not really related to this thread sample, but something regarding string processing in VFP and .Net, that I did.

I had one interesting project this year. The task was to split the big (around 50-70 Mb in average) text files with statements, coming out of the mainframe, into thousands of individual statements of 5 different kinds.
In addition some info had to be extracted from each statement to build the metadata record for it. The individual statements could have any size. All that had to be done in VB.Net. I could identify the single statement beginning by the certain header line and some metadata to be extracted did not have the certain position in the document, but were rather "floating" within the line. So, the input file had to be parsed to the individual lines and analysed for content of the several header lines.

Of course, out of curiosity, I did it first in VFP to see what kind of speed I can get. The fastest VFP solution was:
Suck the complete text file content into DBF with APPEND FROM command

Make the SELECT statement that identifies the header lines of each statement and
 creates a unique ID for each statement

Assign the statement unique ID to all lines of the respective statement 
and index the DBF on the ID field

SCAN through the statement IDs cursor and SELECT the content of each statement 
into a temporary cursor.

Analyse the temporary cursor for extracting the necessary metadata

Dump the temporary cursor into a text file with COPY TO command
I am no expert in .Net, just work with it the first year. The task coincided with DevTeach conference and I asked there several .Net experts opinion on how to approach this, but did not get any useful answer, since it appeared that nobody did such kind of big text files processing in .Net.

So, I had to figure it out myself, using StreamReader, StringBuilder and StreamWriter classes
The input text file was read line by line by StreamReader

The statement header lines were identified

Each line of the statement header was checked and metadata were extracted

The StringBuilder  was used to accumulate the individual statement lines. 

When the single statement was ready it was written to file with StreamWriter
The VB.Net version speed at first was disappointing - 20 times slower than VFP.
Then I optimized the string processing by using StringBuilder, CONCAT function and avoiding & operator as much as possible.

I speeded it up significantly and it was close to VFP speed but still about 2 times slower.

I actually was impressed by how fast StreamReader, StringBuilder and StreamWriter classes work with big texts.

However, for my case I found that in .Net just reading the file line by line with StreamReader and immediately writing it with StreamWriter (with no processing in between) took about 1.5 times longer than do the whole processing in VFP.

BTW, VFP version of the program was about 3 times smaller in size.

>>
>>In addition, ALEN(laFields, 1) can be taken out of SCAN loop and assigned the variable just once.
>
>Hi Nick,
>Yes and yet many more - ie: might even try directly constructing the sql that'd give the result already transformed and tagged:) I used Dragan's approach in one of my needs and it really speeds up the things more.
>You VFP guys never confident with speed :)
>OK I did another version and didn't seek for further optimization :
>
>
>StartTime=Seconds()
>lcMyXML=ToXML('testcursor')
>*Show Time-Taken
>?'Before Test Tag added',Seconds() - StartTime
>
>*Add test tag
>lcTemp = Sys(2015)+'.tmp'
>StrToFile('<TEST>'+Chr(13),m.lcTemp)
>Strtofile(m.lcMyXML,m.lcTemp,.t.)
>StrToFile('</TEST>',m.lcTemp,.t.)
>?'After Test Tag added',Seconds() - StartTime
>
>* Check what really we've generated
>Modify Command (m.lcTemp)
>Erase (m.lcTemp)
>
>Function ToXML
>Lparameters tcCursorName, tcWhere
>Local lcChar,lcTemp,lcXML,lcFields,ix
>lcChar=Chr(13)
>lcTemp = Sys(2015)+'.tmp'
>tcWhere = Iif(Empty(m.tcWhere),'',m.tcWhere)
>Select * from (tcCursorname) &tcWhere into cursor _tmpXML
>Local lcFields
>Set Textmerge Delimiters To "%%","%%"
>Set Textmerge To Memvar m.lcFields Noshow
>Set Textmerge On
>\\f0='<CLIENT>',
>For ix=1 To Fcount()
>  If !Type(Field(ix))$'GM'
>		\\f%%ix%%='&lt;%%Field(ix)%  '+
>		\\%%Iif(Type(Field(ix))$'CM','Trim(','Transform(')%%%%Field(ix)%%)
>		\\+'</%%Field(ix)%  ',
>  Endif
>Endfor
>\\f%%ix%%='</CLIENT>'
>Set Textmerge To
>Set Textmerge Off
>Set Fields Global
>Set Fields To &lcFields
>Copy To (m.lcTemp) Type Delimited With "" With Character &lcChar
>USE in '_tmpXML'
>Set Fields to
>lcXML = Filetostr(m.lcTemp)
>Erase (m.lcTemp)
>Return m.lcXML
>
>And timings on my box (Athlon 650Mhz, 192Mb RAM) :
>Final XMLed file size for this test was 22Mb
>
>VFP did that in 8.01 - 8.20 seconds w/o test tag and 9.5 - 9.6 seconds with test tag added. There I did a read all and write again as you noticed.
>In the meantime I also tried not doing a select but a use again + for clause in 'copy to' and it further decreased to time around 7.0-7.4 seconds
>I still feel I have the power to make it faster than that but seemed enough to me for this test:)
>
>Now timings for C# with the same 'TestCursor' (and I tried to make the environment best for C# as far as I know - what I didn't do is trying to optimize the code Kevin supplied, IOW C# test code is his code except only the path to table) :
>C# did it in 69 seconds if run directly from development IDE w/o debugging. Well actually it was its start-end timing, my wristwatch showed it took more than 100 secs. to see the result.
>I compiled to .exe with csc and that version took 44 seconds, after I NGen'd it was down only 1 sec and took 43 seconds.
>PS: I really don't know why yesterday it took 132 secs for C# to complete, maybe it had less free memory when I tried yesterday or I might have tried with debugging.
>I don't say so VFP is exactly better with these tests, but at least it showed with in the aspects of such a test it's much faster + its IDE capability is much better (I hate development time penalties of .NET if you've to try and go into edit even to modify a single line of code - VFP's interactivity alone is a feature I love, hope there is some in .NET too which I'm not yet aware of).
>Cetin
Nick Neklioudov
Universal Thread Consultant
3 times Microsoft MVP - Visual FoxPro

"I have not failed. I've just found 10,000 ways that don't work." - Thomas Edison
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform