Cetin,
First of all, thanks a lot for your response. I learned many interesting things.
>
>
Nadya,
>Here is a copy of my first suggestion :
>"Similar in FP. Create table or cursor with corresponding types then append from. ie :
>
>create cursor mycursor (Name c(30),PayRate y,Comment c(80),OTRate y)
>append from myFile type delimited
>
>You could use lowlevel functions too.
>Cetin"
In original question, he was opening a file stream and getting delimited values. As long as purpose is something like this, I would always use FP's internal parsing mechanism and use this code.
Ok, I'll do the same way.
>Not only yours but mine too would be slower and/or unnecessary to do this job.
>OTOH array elements are limited to 65000. So probably you would read line by line. First a collection to array line by line, then putting in cursor with a conversion for types. You should add it the conversion routine too then.
>
>Your code works. I didn't critize your code just gave you info that foxtools functions are really making it slow for parsing large text files. Because I needed it in the past and tried.
If I have a different situation - string with less than 40 words but huge table and the string is field value - does my method still be slow than yours?
I mean I parse it in scan ... endscan.
Let's look at what it does closer :
>Think text file as a chain of train wagons :)
>words() is ok, it only counts the wagons once.
>Problem is with wordnum(). Each time it's called with value n, it should count the wagons to get the nth wagon. In a sequential work, it isn't kind to remember where you left off. Hope I'm totally wrong.
>My logic was like this :
>-I know that I'll collect wagons by one one starting at 1st and sequentially go to last.
>-I haven't a need to rewind to start in order to find n+1th wagon when I'm already in front of nth wagon :)
Yes, it's right logic :)
>C sees strings and files as an array of characters. Just what we want :) Array processing is fast. I have two C routines. One parses words delimited with a comma into an array. I use it primarily to parse lists separated by commas such as a fieldlist.
Could you extend it to use any Delimiter?
>The other parses "words" (word is defined in a word class) from a file stream, inserts it in table sorted and unique (table class and B+tree class) and writes out into another file stream one per line. I used this one as a basis to index any text file.
>First one is similar to what your routine does :
>
#include <pro_ext.h>
>char FAR *NullTerminate(Value FAR *cVal)
>{
> char *RetVal;
> if (!_SetHandSize(cVal->ev_handle, cVal->ev_length + 1))
> {
> _Error(182); // "Insufficient memory"
> }
>
> ((char FAR *) _HandToPtr(cVal->ev_handle))[cVal->ev_length] = '\0';
> RetVal = (char FAR *) _HandToPtr(cVal->ev_handle);
> return RetVal;
>}
>
>void FAR Lst2Array(ParamBlk FAR *parm)
>{
> #define MAXLEN 254
> Value val;
> char FAR *InString = NullTerminate(&parm->p[0].val);
> char FAR *ArrayName = NullTerminate(&parm->p[1].val);
> int Slen = _StrLen(InString);
>
> Locator loc;
> NTI nti;
> if ( (nti = _NameTableIndex(ArrayName)) != -1 )
> {
> if ( _FindVar(nti, -1, &loc) ) // If exists - release
> _Release(nti);
> }
> int i=0;
> int count=0;
> while(InString[i++] != '\0') // Count number of commas in list
> {
> if ( InString[i] == ',' )
> count++;
> }
> count++; // List tokens = Commas + 1
>
>
> loc.l_subs = 1;
> loc.l_sub1 = count; // Array len is count
> loc.l_sub2 = 0;
> _NewVar(ArrayName,&loc,NV_PUBLIC); // Create one dimension public array
>
> val.ev_type = 'C';
> val.ev_length = MAXLEN;
> val.ev_handle = _AllocHand(MAXLEN);
> char FAR *rep = (char * ) _HandToPtr(val.ev_handle);
> _HLock(val.ev_handle);
> i=count=0;
> while(InString[i] != '\0') // Scan string
> {
> int j = 0 ;
> while (InString[i] != '\0'
> rep[j++] = InString[i++] ; // Get current list element
> rep[j] = '\0'; // null terminate
>
> val.ev_length = j;
> count++;
> loc.l_sub1 = count; // Set next array elem - FP arrays start at 1
> _Store(&loc, &val); // Store to array elem
>
> while (InString[i] != '\0'
> i++;
> }
> _HUnLock(val.ev_handle);
>}
>
>FoxInfo myFoxInfo[] =
>{
> {"LIST2ARRAY", (FPFI) Lst2Array, 2, "CC"},
>};
>
>
>extern "C" {
>FoxTable _FoxTable =
>{
> (FoxTable *) 0, sizeof(myFoxInfo)/sizeof(FoxInfo), myFoxInfo
>};
>}
If you have C++ then you could build this into a FLL.
Could you please tell me exactly how to do that (I've done this in past using Watcom and FPD2.6, but it was while ago, and I don't remember). Can I use Visual C++ 6.0?
You could call this like yours :
>=list2Array("Some, text string, parser","aMyArray")
>It would create array "aMyArray" with elements :
>aMyArray[1] = "Some"
>aMyArray[2] = "text string"
>aMyArray[3] = "parser"
>
>If you can try it and see the speed difference yourself :) Here is a test code and results on my computer :
>
>use _samples+"\data\orders"
>copy to temp.txt type delimited
>lcString = filetostr("temp.txt")
>
>start = seconds()
>dimension aWords[1]
>lnWords = aParser(@aWords, lcString, ",")
>? seconds()-start, lnWords
>
>start = seconds()
>set library to str2arr.fll additive
>=list2Array(lcString, "aMyArray")
>? seconds()-start, alen(aMyArray)
>
>Both return a wordcount of 18607.
>aParser() finished in
473.689 secs (and I thought I had a hang), list2array() took
0.278 secs to finish. My test might not be a good one but I see the logic difference like this :
>go top
>lnCurRec = 0
>do while eof()
> go top
> for ix = 1 to lnCurRec
> skip
> endfor
> lnCurRec = lnCurRec + 1
> * Get field value
>enddo
>
>* What I like
>scan
> * Get field value
>endscan
As a conclusion I defend the idea that foxtools wordnum function is not suitable for sequential word parsing unless string has only a few words.
>Cetin
If it's not broken, fix it until it is.
My Blog