Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Parsing expressions defined by end users
Message
De
29/04/2016 01:05:39
 
 
À
28/04/2016 20:11:20
Information générale
Forum:
Visual FoxPro
Catégorie:
Autre
Versions des environnements
Visual FoxPro:
VFP 9 SP2
Divers
Thread ID:
01635536
Message ID:
01635604
Vues:
65
>>In an application I'm working with, users can insert an expression to be used as a formula to evaluate parameter values. I'm relying on VFP's own parser to do this, but there are many issues involved, including stability and security issues.
>>
>>I'll need to strip down the parsing to only accept a much more confined set of functions, and to prevent access to variables and run-time objects (starting with _VFP and the likes). I know that this can be done and how to do it, but wonder if anybody has done this previously or know of anything that has been already developed and it is available. For instance, if you authorize your users to edit reports, how do you secure the expressions they insert as field values?
>
>I think I have something workable, right now, but would greatly appreciate your tests and remarks (am I missing anything? am I doing something wrong?).
>
>I hope this is self-contained, so that you only have to copy into a command editor and execute it.
>
>The tester asks for an expression and then tries to tokenize it. The next step, which I didn't cover here but I think is quite simple, is to check if any of the identified tokens is allowed or prohibited.
>
>Supported data types: numeric (decimal), strings, dates, boolean and .NULL. (will add others, later on).
>
>No variables allowed, but that can be changed if a proper pattern is added.
>
>
>LOCAL loTokenizer AS Tokenizer
>LOCAL lcToken AS String
>LOCAL lcTest AS String
>
>m.loTokenizer = CREATEOBJECT("Tokenizer")
>m.loTokenizer.AddTokenPattern("(\+|-)?(\.\d+|\d+(\.\d+)?)((e|E)(\+|-)?\d+)*")
>m.loTokenizer.AddTokenPattern('("[^"]*"|' + "'[^']*'|\[[^\[]*\])")
>m.loTokenizer.AddTokenPattern("{\^\d{1,4}-\d{1,2}-\d{1,2}}")
>m.loTokenizer.AddTokenPattern("\.(t|T|f|F|((n|N)(u|U)[lL]{2}))\.")
>m.loTokenizer.AddTokenPattern("(\+|-|\*|\/|%|,|\))")
>m.loTokenizer.AddTokenPattern("[a-zA-Z][a-zA-Z0-9]*\(")
>
>CLEAR
>ACCEPT "Test Expression: " TO m.loTest
>
>IF !m.loTokenizer.GetTokens(m.loTest)
>
>	? "Error @" + m.loTokenizer.ErrorPointer
>
>ELSE
>
>	? "Tokens found:"
>	FOR EACH m.lcToken IN m.loTokenizer.Tokens
>		? m.lcToken
>	ENDFOR
>
>ENDIF
>
>DEFINE CLASS Tokenizer AS Custom
>
>	RegExpr = .NULL.
>	TokenPatterns = .NULL.
>	Tokens = .NULL.
>	ErrorPointer = ""
>	
>	FUNCTION Init
>
>		IF !"\_REGEXP.VCX" $ SET("Classlib")
>			SET CLASSLIB TO (ADDBS(HOME(1)) + "ffc\_regexp.vcx") ADDITIVE
>		ENDIF
>
>		This.RegExpr = CREATEOBJECT("_regexp")
>		
>		This.TokenPatterns = CREATEOBJECT("collection")
>		This.Tokens = CREATEOBJECT("collection")
>
>	ENDFUNC
>
>	FUNCTION AddTokenPattern (tcPattern AS String)
>
>		This.TokenPatterns.Add(IIF(LEFT(m.tcPattern,1) != "^", "^", "") + m.tcPattern)
>
>	ENDFUNC
>
>	FUNCTION GetTokens (tcExpression AS String)
>
>		LOCAL lcExpression AS String
>		LOCAL lcTokenPattern AS String
>		LOCAL lnTokenPattern AS Integer
>		LOCAL lcToken AS String
>		
>		m.lcExpression = ALLTRIM(m.tcExpression)
>		This.Tokens.Remove(-1)
>		This.ErrorPointer = ""
>
>		DO WHILE !EMPTY(m.lcExpression)
>
>			m.lcToken = .NULL.
>
>			FOR m.lnTokenPattern = 1 TO This.TokenPatterns.Count
>			
>				m.lcTokenPattern = This.TokenPatterns.Item(m.lnTokenPattern)
>					
>				This.RegExpr.Clear()
>				This.RegExpr.Pattern = m.lcTokenPattern
>
>				IF This.RegExpr.Execute(m.lcExpression,.F.) = 1
>
>					m.lcToken = This.RegExpr.Matches[1,2]
>					m.lcExpression = LTRIM(SUBSTR(m.lcExpression,LEN(m.lcToken) + 1))
>
>					This.Tokens.Add(m.lcToken)
>					EXIT
>
>				ENDIF
>			ENDFOR
>
>			IF ISNULL(m.lcToken)
>				This.ErrorPointer = m.lcExpression
>				RETURN .F.
>			ENDIF
>		ENDDO
>
>		RETURN .T.
>	ENDFUNC
>
>ENDDEFINE
>
Antonio,

It's a good starting point
(1) Ensure you take the longest match
eg
=
==
!
!=
<
<=
>
>=
You can do this by testing all patterns and take the longest match - or combining all the patterns into one

(2) When you trim, there are more white space chars than the space char, there is eg the TAB char which you do not trim
To catch those, I would let the patterns start with ^ to make sure you start matching at the beginning of the string

Also the ^ at the beginning of a pattern ensures you don't match somewhere in the middle

(3) You can easily add a pattern for a string

Update
(4) You are not splitting into 'real' tokens
A token also tells which token class it is, eg number, string, word, operator

To do that you need one or more patterns per token class

eg
NUMBER:("(\+|-)?(\.\d+|\d+(\.\d+)?)((e|E)(\+|-)?\d+)*")
And take the longest match
Gregory
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform