String parsing - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

String parsing

Message

08/10/1999 23:05:11

Naomi Nosonovsky
Wisconsin, États-Unis

08/10/1999 05:41:44

Cetin Basoz
Engineerica Inc.
Izmir, Turquie

Information générale

Forum:

Visual FoxPro

Catégorie:

Codage, syntaxe et commandes

Titre:

Re: String parsing

Divers

Thread ID:

00273042

Message ID:

00274537

Vues:

Cetin,

First of all, thanks a lot for your response. I learned many interesting things.
>
>

Nadya,
>Here is a copy of my first suggestion :
>"Similar in FP. Create table or cursor with corresponding types then append from. ie :
>
>create cursor mycursor (Name c(30),PayRate y,Comment c(80),OTRate y)
>append from myFile type delimited
>
>You could use lowlevel functions too.
>Cetin"

In original question, he was opening a file stream and getting delimited values. As long as purpose is something like this, I would always use FP's internal parsing mechanism and use this code.

Ok, I'll do the same way.

>Not only yours but mine too would be slower and/or unnecessary to do this job.
>OTOH array elements are limited to 65000. So probably you would read line by line. First a collection to array line by line, then putting in cursor with a conversion for types. You should add it the conversion routine too then.
>
>Your code works. I didn't critize your code just gave you info that foxtools functions are really making it slow for parsing large text files. Because I needed it in the past and tried.

If I have a different situation - string with less than 40 words but huge table and the string is field value - does my method still be slow than yours?
I mean I parse it in scan ... endscan.

Let's look at what it does closer :
>Think text file as a chain of train wagons :)
>words() is ok, it only counts the wagons once.
>Problem is with wordnum(). Each time it's called with value n, it should count the wagons to get the nth wagon. In a sequential work, it isn't kind to remember where you left off. Hope I'm totally wrong.
>My logic was like this :
>-I know that I'll collect wagons by one one starting at 1st and sequentially go to last.
>-I haven't a need to rewind to start in order to find n+1th wagon when I'm already in front of nth wagon :)

Yes, it's right logic :)

>C sees strings and files as an array of characters. Just what we want :) Array processing is fast. I have two C routines. One parses words delimited with a comma into an array. I use it primarily to parse lists separated by commas such as a fieldlist.

Could you extend it to use any Delimiter?

>The other parses "words" (word is defined in a word class) from a file stream, inserts it in table sorted and unique (table class and B+tree class) and writes out into another file stream one per line. I used this one as a basis to index any text file.
>First one is similar to what your routine does :
>

#include <pro_ext.h>
>char FAR *NullTerminate(Value FAR *cVal)
>{
>	char *RetVal;
>	if (!_SetHandSize(cVal->ev_handle, cVal->ev_length + 1))
>	{
>		_Error(182); // "Insufficient memory"
>	}
>	
>	((char FAR *) _HandToPtr(cVal->ev_handle))[cVal->ev_length] = '\0';
>	RetVal = (char FAR *) _HandToPtr(cVal->ev_handle);
>	return RetVal;
>}
>
>void FAR Lst2Array(ParamBlk FAR *parm)
>{
>	#define MAXLEN 254
>	Value val;
>	char FAR *InString = NullTerminate(&parm->p[0].val);	
>	char FAR *ArrayName = NullTerminate(&parm->p[1].val);
>	int Slen = _StrLen(InString);
>	
>	Locator loc;
>	NTI nti;
>	if ( (nti = _NameTableIndex(ArrayName)) != -1 )
>	{
>		if ( _FindVar(nti, -1, &loc) )         // If exists - release
>			_Release(nti);
>	}
>	int i=0;
>	int count=0;
>	while(InString[i++] != '\0')	// Count number of commas in list
>	{
>		if ( InString[i] == ',' )
>			count++;				
>	}
>	count++;   // List tokens = Commas + 1
>
>
>	loc.l_subs = 1;
>	loc.l_sub1 = count;   // Array len is count
>	loc.l_sub2 = 0;
>	_NewVar(ArrayName,&loc,NV_PUBLIC); // Create one dimension public array
>
>	val.ev_type = 'C';
>	val.ev_length = MAXLEN;
>	val.ev_handle = _AllocHand(MAXLEN);
>	char FAR *rep = (char * ) _HandToPtr(val.ev_handle);
>	_HLock(val.ev_handle);
>	i=count=0;
>	while(InString[i] != '\0')  // Scan string
>	{
>		int j = 0 ;
>		while (InString[i] != '\0' && InString[i] != ',')
>                   rep[j++] = InString[i++] ;     // Get current list element
>		rep[j] = '\0';  // null terminate
>
>		val.ev_length = j;
>		count++;
>		loc.l_sub1 = count; // Set next array elem - FP arrays start at 1
>		_Store(&loc, &val); // Store to array elem
>		
>		while (InString[i] != '\0' &&  !__iscsym(InString[i]) )
>			i++;
>	}
>	_HUnLock(val.ev_handle);
>}
>
>FoxInfo myFoxInfo[] =
>{
>	{"LIST2ARRAY", (FPFI) Lst2Array, 2, "CC"},
>};
>
>
>extern "C" {
>FoxTable _FoxTable =
>{
>	(FoxTable *) 0, sizeof(myFoxInfo)/sizeof(FoxInfo), myFoxInfo
>};
>}

If you have C++ then you could build this into a FLL.

Could you please tell me exactly how to do that (I've done this in past using Watcom and FPD2.6, but it was while ago, and I don't remember). Can I use Visual C++ 6.0?

You could call this like yours :
>=list2Array("Some, text string, parser","aMyArray")
>It would create array "aMyArray" with elements :
>aMyArray[1] = "Some"
>aMyArray[2] = "text string"
>aMyArray[3] = "parser"
>
>If you can try it and see the speed difference yourself :) Here is a test code and results on my computer :
>
>use _samples+"\data\orders"
>copy to temp.txt type delimited
>lcString = filetostr("temp.txt")
>
>start = seconds()
>dimension aWords[1]
>lnWords = aParser(@aWords, lcString, ",")
>? seconds()-start, lnWords
>
>start = seconds()
>set library to str2arr.fll additive
>=list2Array(lcString, "aMyArray")
>? seconds()-start, alen(aMyArray)
>
>Both return a wordcount of 18607.
>aParser() finished in 473.689 secs (and I thought I had a hang), list2array() took 0.278 secs to finish. My test might not be a good one but I see the logic difference like this :

* Foxtools approach
>go top
>lnCurRec = 0
>do while eof()
> go top
> for ix = 1 to lnCurRec
>   skip
> endfor
> lnCurRec = lnCurRec + 1
> * Get field value
>enddo
>
>* What I like
>scan
> * Get field value
>endscan

As a conclusion I defend the idea that foxtools wordnum function is not suitable for sequential word parsing unless string has only a few words.
>Cetin

If it's not broken, fix it until it is.

My Blog

Répondre

Fil

Voir

Click here to load this message in the networking platform