>Hi everybody,
>
>I'm trying to work out an algorithm for expression parser. A while ago I read something about it in Knut, but I could not find this book at the moment, besides, my expressions are much easier.
>
>Let me try to describe the problem.
>
>The expression may include units, such as sf, lb, etc.
>
>It also may include variables that correspond to properties of the class, say,
>
>cw - this.nCW
>
>f - this.nFloor
>
>c - this.cCeiling
>
>w - this.nWalls
>
>etc.
>
>Examples of expressions:
>
>cf is the same as c*f = this.cCeiling * this.nFloor
>
>2f the same as 2 * this.nFloor
>
>2@ is the same as 2
>
>2sf is the same as 2 (units)
>
>multiplication may be expressed by *, by x or just by placing var var (no space) or Number var or var number.
>
>There could be multiple parts, such as, I think:
>
>sf + cf
>
>This would be 1 + this.nCeiling * this.nFloor
>
>My original idea based on limited number of examples was to first remove all units from string, replace x with * and then work with variables translating them into class properties and inserting * if needed.
>
>I'm wondering if you can give me some direction about approaching this problem.
>
>Thanks.
>Hi everybody,
>
>I'm trying to work out an algorithm for expression parser. A while ago I read something about it in Knut, but I could not find this book at the moment, besides, my expressions are much easier.
>
>Let me try to describe the problem.
>
>The expression may include units, such as sf, lb, etc.
>
>It also may include variables that correspond to properties of the class, say,
>
>cw - this.nCW
>
>f - this.nFloor
>
>c - this.cCeiling
>
>w - this.nWalls
>
>etc.
>
>Examples of expressions:
>
>cf is the same as c*f = this.cCeiling * this.nFloor
>
>2f the same as 2 * this.nFloor
>
>2@ is the same as 2
>
>2sf is the same as 2 (units)
>
>multiplication may be expressed by *, by x or just by placing var var (no space) or Number var or var number.
>
>There could be multiple parts, such as, I think:
>
>sf + cf
>
>This would be 1 + this.nCeiling * this.nFloor
>
>My original idea based on limited number of examples was to first remove all units from string, replace x with * and then work with variables translating them into class properties and inserting * if needed.
>
>I'm wondering if you can give me some direction about approaching this problem.
>
>Thanks.
__________________________________________________________________________________________________
(1) multiplication may be expressed by *, by x or just by placing var var (no space) or Number var or var number
Difficult
2xy
is this 2 * x * y or 2 * y ?
y2
is this Y * 2 or a variable
3y is ok, since you have a number preceding the x
3e2
is this 3 * e * 2 or 300 ?
3aw
is this 3 * aw or 3 * a * w ?
(2) First you have to make an algorithm which breaks the 'input stream' into tokens (variables, operators, parens)
You may even accept [] as parens besides the usual ()
Therefore
- you have to define what syntax a variable can have, eg RegEx : [_a-zA-Z][_0-9a-zA-z]*
- what operators there are, eg * / + - ^
- be able to distinguish between a minus as an operator (eg in 1-3) and a unary minus (eg -3)
3) Token: following are all tokens
I make a distinction between
- an operator : + - * / ^
- unary minus : - (which is an operator with higher precedence than any other operator)
- a symbol or operand: variable or number
- parenthesis
To find out whether an infix expression is valid, you have to convert it to rpn (Reverse Polish Notation )
infix: 1 - 3
rpn: 1 3 -
4) for each of the above define
- an input precedence
- a stack precedence
- a rank
Input precedence : precedence in the input stream
Stack precedence: precedence when the token has been pushed on the stack
Rank:
- for an operator : 1 - number of operands ( eg for + it becomes 1 - 2 = -1 and for unary operators the ranks is 1 - 1 = 0)
- for an operand it is 1
- for parens it is undefined
example
An expression is well formed if
(1) Its Rank equals 1
(2) at no time does the rank of an expression < 1
Algo:
(1) a function (Token_Next(@m.Token)) which reads the input stream and returns a token each time it is called
return values: TRUE = I have a token, FALSE = no token or EndOfStream
(2)
Initialize a lookup array with all the operators (including unary minus), left parens, right parens (not symbol nor LeftInternal[ and RightInternal]) and columns for their input precedence, stack precedence and ranks
Initialize a stack
=Stack.push(LeftInternal[)
Rank = 0
EndOfInputStream = FALSE
Success = TRUE
do while m.Success
do case
case EndOfInputStream
exit
case !Token_next(@Token)
EndOfInputStream = TRUE
s_InputPrecedence = -2
s_StackPrecedence = undefined
s_Rank = undefined
case Token in (LeftInternal[, RightInternal])
Success = FALSE
case lookup of token in array succeeds
s_InputPrecedence = the input precedence of Token
s_StackPrecedence = the stack precedence of Token
s_Rank = the rank of Token
otherwise
s_InputPrecedence = 9
s_StackPrecedence = 10
s_Rank = 1
endcase
do while m.Success
do case
case !Stack.Peek( , , @TopStack_SP)
Success = FALSE
case s_StackPrecedence >= m.TopStack_SP
exit
case !m.this.Stack.Pop(@p_Symbol, @p_IP, @p_SP, @p_Rank)
Success = FALSE
case p_Rank is undefined
Success = FALSE
case !Output( p_symbol )
Success = FALSE
otherwise
Rank = m.Rank + m.p_Rank
if( m.Rank < 1 )
Success = FALSE
endif
endcase
enddo
do case
case !m.Success
case (m.s_InputPrecedence == m.TopStack_SP)
if( !m.this.Stack.Pop() )
Success = FALSE
endif
case UNDEFINED(m.s_Rank) and UNDEFINED(s_StackPrecedence)
Success = FALSE
case !m.this.Stack.Push(Token, m.s_InputPrecedence, s_StackPrecedence, m.s_Rank)
Success = FALSE
endcase
enddo
do case
case !m.Success
case !m.this.Stack.Empty()
Success = FALSE
case m.Rank <> 1
Success = FALSE
endcase
Notes
(1) this is basically what a compiler does. a + b * c is converted to a b c * +
(2) To distinguish between a unary minus and a binary minus: a unary minus is at the beginning of the expression or immediatly after a left parens
(3) if you support expressions like 3ab meaning 3 * ab, you may have to return the extra token * in between
(4) for further explanations, google for infix to rpn algo's
Gregory