Parsing expressions defined by end users

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Parsing expressions defined by end users

Message

From

29/04/2016 04:09:03

Gregory Adam
Belgium

29/04/2016 03:13:41

Antonio Lopes
BookMARC
Coimbra, Portugal

General information

Forum:

Visual FoxPro

Category:

Other

Title:

Re: Parsing expressions defined by end users

Environment versions

Visual FoxPro:

VFP 9 SP2

Miscellaneous

Thread ID:

01635536

Message ID:

01635623

Views:

>Thank you so much for your comments, Gregory. My notes point by point:

Answers are inline Antonio,

>
>>
>>Antonio,
>>
>>It's a good starting point
>>(1) Ensure you take the longest match
>>eg
>>
>>

>>=
>>==
>>!
>>!=
>><
>><=
>>>
>>>=
>>

>>
>>You can do this by testing all patterns and take the longest match - or combining all the patterns into one
>
>Good point - and catch! Forgot about logical operators.
>
>Revised pattern for operators:
>

>m.loTokenizer.AddTokenPattern("(==|!=|<=|>=|<>|!|=|>|<|\+|-|\*|\/|%|,|\)|\(|\$)")
>

>
>>
>>(2) When you trim, there are more white space chars than the space char, there is eg the TAB char which you do not trim
>>To catch those, I would let the patterns start with ^ to make sure you start matching at the beginning of the string
>>
>>Also the ^ at the beginning of a pattern ensures you don't match somewhere in the middle
>
>Changed trimmers to catch all [accepted] white space:
>

>m.lcExpression = ALLTRIM(m.tcExpression,0," ",CHR(13),CHR(10),CHR(9))
>

>and
>

>m.lcExpression = LTRIM(SUBSTR(m.lcExpression,LEN(m.lcToken) + 1),0," ",CHR(13),CHR(10),CHR(9))
>

Not sure about the chr(13) and chr(10) - you may need to preprocess to fold continuation lines

What will you do with comments ? Preprocess or have the lexer skip them ?

>All patterns are already set to the beginning of the expression. The AddTokenPattern() method makes sure of that by appending ^ at the start of pattern, if it is not already included.
>
>>
>>(3) You can easily add a pattern for a string
>
>Sorry, I don't understand this remark. There is already a pattern for strings, isn't it?

Yes you are right - looked over that

>
>>
>>Update
>>(4) You are not splitting into 'real' tokens
>>A token also tells which token class it is, eg number, string, word, operator
>>
>>To do that you need one or more patterns per token class
>>
>>eg
>>

>>NUMBER:("(\+|-)?(\.\d+|\d+(\.\d+)?)((e|E)(\+|-)?\d+)*")
>>

>>
>>And take the longest match
>
>Will do that, it will ease the work of the post-tokenizer (for instance, just check on "FUNCTION" tokens for allowed functions).

I think that may be difficult to do

All you can say that it is a word

abs = 3
?abs

When you match a 'word' you don't know for sure if it is a function, command or a variable

Gregory

Map

View

Click here to load this message in the networking platform