Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Parsing expressions defined by end users
Message
From
29/04/2016 04:09:03
 
 
To
29/04/2016 03:13:41
General information
Forum:
Visual FoxPro
Category:
Other
Environment versions
Visual FoxPro:
VFP 9 SP2
Miscellaneous
Thread ID:
01635536
Message ID:
01635623
Views:
67
>Thank you so much for your comments, Gregory. My notes point by point:

Answers are inline Antonio,

>
>>
>>Antonio,
>>
>>It's a good starting point
>>(1) Ensure you take the longest match
>>eg
>>
>>
>>=
>>==
>>!
>>!=
>><
>><=
>>>
>>>=
>>
>>
>>You can do this by testing all patterns and take the longest match - or combining all the patterns into one
>
>Good point - and catch! Forgot about logical operators.
>
>Revised pattern for operators:
>
>m.loTokenizer.AddTokenPattern("(==|!=|<=|>=|<>|!|=|>|<|\+|-|\*|\/|%|,|\)|\(|\$)")
>
>
>>
>>(2) When you trim, there are more white space chars than the space char, there is eg the TAB char which you do not trim
>>To catch those, I would let the patterns start with ^ to make sure you start matching at the beginning of the string
>>
>>Also the ^ at the beginning of a pattern ensures you don't match somewhere in the middle
>
>Changed trimmers to catch all [accepted] white space:
>
>m.lcExpression = ALLTRIM(m.tcExpression,0," ",CHR(13),CHR(10),CHR(9))
>
>and
>
>m.lcExpression = LTRIM(SUBSTR(m.lcExpression,LEN(m.lcToken) + 1),0," ",CHR(13),CHR(10),CHR(9))
>
Not sure about the chr(13) and chr(10) - you may need to preprocess to fold continuation lines

What will you do with comments ? Preprocess or have the lexer skip them ?

>All patterns are already set to the beginning of the expression. The AddTokenPattern() method makes sure of that by appending ^ at the start of pattern, if it is not already included.
>
>>
>>(3) You can easily add a pattern for a string
>
>Sorry, I don't understand this remark. There is already a pattern for strings, isn't it?


Yes you are right - looked over that

>
>>
>>Update
>>(4) You are not splitting into 'real' tokens
>>A token also tells which token class it is, eg number, string, word, operator
>>
>>To do that you need one or more patterns per token class
>>
>>eg
>>
>>NUMBER:("(\+|-)?(\.\d+|\d+(\.\d+)?)((e|E)(\+|-)?\d+)*")
>>
>>
>>And take the longest match
>
>Will do that, it will ease the work of the post-tokenizer (for instance, just check on "FUNCTION" tokens for allowed functions).

I think that may be difficult to do

All you can say that it is a word
abs = 3
?abs
When you match a 'word' you don't know for sure if it is a function, command or a variable
Gregory
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform