Parsing expressions defined by end users

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Parsing expressions defined by end users

Message

29/04/2016 01:05:39

Gregory Adam
Belgique

28/04/2016 20:11:20

Antonio Lopes
BookMARC
Coimbra, Portugal

Information générale

Forum:

Visual FoxPro

Catégorie:

Autre

Titre:

Re: Parsing expressions defined by end users

Versions des environnements

Visual FoxPro:

VFP 9 SP2

Divers

Thread ID:

01635536

Message ID:

01635604

Vues:

>>In an application I'm working with, users can insert an expression to be used as a formula to evaluate parameter values. I'm relying on VFP's own parser to do this, but there are many issues involved, including stability and security issues.
>>
>>I'll need to strip down the parsing to only accept a much more confined set of functions, and to prevent access to variables and run-time objects (starting with _VFP and the likes). I know that this can be done and how to do it, but wonder if anybody has done this previously or know of anything that has been already developed and it is available. For instance, if you authorize your users to edit reports, how do you secure the expressions they insert as field values?
>
>I think I have something workable, right now, but would greatly appreciate your tests and remarks (am I missing anything? am I doing something wrong?).
>
>I hope this is self-contained, so that you only have to copy into a command editor and execute it.
>
>The tester asks for an expression and then tries to tokenize it. The next step, which I didn't cover here but I think is quite simple, is to check if any of the identified tokens is allowed or prohibited.
>
>Supported data types: numeric (decimal), strings, dates, boolean and .NULL. (will add others, later on).
>
>No variables allowed, but that can be changed if a proper pattern is added.
>
>

>LOCAL loTokenizer AS Tokenizer
>LOCAL lcToken AS String
>LOCAL lcTest AS String
>
>m.loTokenizer = CREATEOBJECT("Tokenizer")
>m.loTokenizer.AddTokenPattern("(\+|-)?(\.\d+|\d+(\.\d+)?)((e|E)(\+|-)?\d+)*")
>m.loTokenizer.AddTokenPattern('("[^"]*"|' + "'[^']*'|\[[^\[]*\])")
>m.loTokenizer.AddTokenPattern("{\^\d{1,4}-\d{1,2}-\d{1,2}}")
>m.loTokenizer.AddTokenPattern("\.(t|T|f|F|((n|N)(u|U)[lL]{2}))\.")
>m.loTokenizer.AddTokenPattern("(\+|-|\*|\/|%|,|\))")
>m.loTokenizer.AddTokenPattern("[a-zA-Z][a-zA-Z0-9]*\(")
>
>CLEAR
>ACCEPT "Test Expression: " TO m.loTest
>
>IF !m.loTokenizer.GetTokens(m.loTest)
>
>	? "Error @" + m.loTokenizer.ErrorPointer
>
>ELSE
>
>	? "Tokens found:"
>	FOR EACH m.lcToken IN m.loTokenizer.Tokens
>		? m.lcToken
>	ENDFOR
>
>ENDIF
>
>DEFINE CLASS Tokenizer AS Custom
>
>	RegExpr = .NULL.
>	TokenPatterns = .NULL.
>	Tokens = .NULL.
>	ErrorPointer = ""
>	
>	FUNCTION Init
>
>		IF !"\_REGEXP.VCX" $ SET("Classlib")
>			SET CLASSLIB TO (ADDBS(HOME(1)) + "ffc\_regexp.vcx") ADDITIVE
>		ENDIF
>
>		This.RegExpr = CREATEOBJECT("_regexp")
>		
>		This.TokenPatterns = CREATEOBJECT("collection")
>		This.Tokens = CREATEOBJECT("collection")
>
>	ENDFUNC
>
>	FUNCTION AddTokenPattern (tcPattern AS String)
>
>		This.TokenPatterns.Add(IIF(LEFT(m.tcPattern,1) != "^", "^", "") + m.tcPattern)
>
>	ENDFUNC
>
>	FUNCTION GetTokens (tcExpression AS String)
>
>		LOCAL lcExpression AS String
>		LOCAL lcTokenPattern AS String
>		LOCAL lnTokenPattern AS Integer
>		LOCAL lcToken AS String
>		
>		m.lcExpression = ALLTRIM(m.tcExpression)
>		This.Tokens.Remove(-1)
>		This.ErrorPointer = ""
>
>		DO WHILE !EMPTY(m.lcExpression)
>
>			m.lcToken = .NULL.
>
>			FOR m.lnTokenPattern = 1 TO This.TokenPatterns.Count
>			
>				m.lcTokenPattern = This.TokenPatterns.Item(m.lnTokenPattern)
>					
>				This.RegExpr.Clear()
>				This.RegExpr.Pattern = m.lcTokenPattern
>
>				IF This.RegExpr.Execute(m.lcExpression,.F.) = 1
>
>					m.lcToken = This.RegExpr.Matches[1,2]
>					m.lcExpression = LTRIM(SUBSTR(m.lcExpression,LEN(m.lcToken) + 1))
>
>					This.Tokens.Add(m.lcToken)
>					EXIT
>
>				ENDIF
>			ENDFOR
>
>			IF ISNULL(m.lcToken)
>				This.ErrorPointer = m.lcExpression
>				RETURN .F.
>			ENDIF
>		ENDDO
>
>		RETURN .T.
>	ENDFUNC
>
>ENDDEFINE
>

Antonio,

It's a good starting point
(1) Ensure you take the longest match
eg

=
==
!
!=
<
<=
>
>=

You can do this by testing all patterns and take the longest match - or combining all the patterns into one

(2) When you trim, there are more white space chars than the space char, there is eg the TAB char which you do not trim
To catch those, I would let the patterns start with ^ to make sure you start matching at the beginning of the string

Also the ^ at the beginning of a pattern ensures you don't match somewhere in the middle

(3) You can easily add a pattern for a string

Update
(4) You are not splitting into 'real' tokens
A token also tells which token class it is, eg number, string, word, operator

To do that you need one or more patterns per token class

eg

NUMBER:("(\+|-)?(\.\d+|\d+(\.\d+)?)((e|E)(\+|-)?\d+)*")

And take the longest match

Gregory

Répondre

Fil

Voir

Click here to load this message in the networking platform