Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Parsing expressions defined by end users
Message
From
28/04/2016 14:25:37
 
 
To
28/04/2016 14:02:30
Lutz Scheffler
Lutz Scheffler Software Ingenieurbüro
Dresden, Germany
General information
Forum:
Visual FoxPro
Category:
Other
Environment versions
Visual FoxPro:
VFP 9 SP2
Miscellaneous
Thread ID:
01635536
Message ID:
01635572
Views:
57
>>>>>>In an application I'm working with, users can insert an expression to be used as a formula to evaluate parameter values. I'm relying on VFP's own parser to do this, but there are many issues involved, including stability and security issues.
>>>>>>
>>>>>>I'll need to strip down the parsing to only accept a much more confined set of functions, and to prevent access to variables and run-time objects (starting with _VFP and the likes). I know that this can be done and how to do it, but wonder if anybody has done this previously or know of anything that has been already developed and it is available. For instance, if you authorize your users to edit reports, how do you secure the expressions they insert as field values?
>>>>>
>>>>>This is a pain.
>>>>>
>>>>>What I think of is
>>>>>
>>>>>Basic is to take the expression into a try catch for testing.
>>>>>After that it's a pain. (But better do the pain before the try catch or the harm is done)
>>>>>Get the expression, resolve it, typically by transforming into (reverse) polish notation, check the operators, limit it to a set of functions. (any function or operator should be an operator after the transformation). Possibly you can optimize by checking each operator against the list the moment you resolve the operator.
>>>>>The transformation into polish notation is well known to parse an expression There is good documentaion on the web.
>>>>>
>>>>>If you do this for reports, you have to do this for each and every field that can hold an expression
>>>>>-fields
>>>>>-variables
>>>>>-grouping
>>>>>-PrintWhen
>>>>>-dynamics
>>>>>just to name some
>>>>
>>>>Thank you for your input, Lutz.
>>>>
>>>>Fortunately, there are no reports involved, but since I know people may let users edit reports, I hoped someone would have faced this problem before, and addressed it in some form.
>>>>
>>>>The purpose of the process I'm looking for is to limit the access to potentially dangerous objects and functions. So, I think that a simple de-tokenization that can identify authorized functions, and constants in the supported data types, will do. The evaluation will be left to the VFP parser (with proper error handler).
>>>
>>>I have no idea what you you mean with de-tokenization here :(
>>
>>If I'm not mistaken, he's basically saying that once you've parsed the input string into a sequence of tokens, you can perform a lexical analysis by traversing the parse tree to identify the identifier and check them against a list of what you want to allow or prohibit.
>
>Then I do not understand the whole thread. If it's parsed most of the work is done?

Up to a point. The tokenization -- the breaking apart the original input into parts representing symbolic components (e.g. identifier names and operators) -- is only one of the stages. The next stage is to build a (logical) parse tree that contains relationships -- some of which would be driven by the context of the symbolic information found -- this in turn could be used to check the validity (e.g. make sure the syntax is OK).

Back in the DOS days, one of the sample programs that you got with a copy of Turbo C was a mini spreadsheet. The sample contained a set of scripts you'd use with yacc and lexx (programs you'd find in Unix systems). Since yacc and lexx aren't generally found on most DOS systems, they provided a copy of the C program generated by yacc and lexx. The yacc script was basically a specification for a state engine that allows input to be parsed into tokens. The lexx script specified the lexical analysis part -- i.e. defined the arrangement of the tokens and how they'd be logically related to each other. These scripts were necessary for enabling the functionality where you could enter in expressions into the individual cells. IIRC a version of the same sample spreadsheet program was available in Turbo Pascal (which utilized the separately-compiled OBJ files generated in Turbo C). One of the novel uses of yacc and lexx I'd seen was in creation of the program "spew" -- a random text generator that utilized a rules file that specified the structure of the output. "yacc" was used in creating the parser to tokenize the rules file, and lexx was used to generate the program that made sense of the tokenized tree to create the text generator. The standard rules file was for generating headlines that read like stuff you'd see in National Enquirer. Rather clever coding allowed you to even encode rules for pluralization of nouns as well as creating output that properly handled past/present/future tense.
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform