ANTLR is a fantastic library that you can use to generate tokenizer and parser code by writing rules that resembles regular expressions.

While the simplicity of the original parser is lost, ATNLR makes up for it in flexibility and ease in parsing complex patterns.

I won't go deep into ANTLR basics here. I started with a preexisting grammar file for c# and looked to http://programming-pages.com/2012/06/28/antlr-with-c-a-simple-grammar/ for the basics.

Instead I will focus on a specific case of how to add a custom operator to Expression Evaluator. This is an actual change implemented by me for user ehagen's fork that adds a custom operator that acts as a shortcut for String.Contains. You can see the fork over here while the clone URL is https://git01.codeplex.com/forks/ehagen/extraoperators.

ehagen customized his build of ExpressionEvaluator with a string.contains operator such that string1 ~= string2 would compile to a custom function Compare(string1, string2) which would return true if string1 contained string2. It was fairly simple to do in 1.x, but ehagen wanted to know if it could be done in 2.x.

In general a match is defined like this:

access_modifier pattern_name returns [type propertyname]:
   pattern_to_match { code_to_execute };

the pattern to match can be the actual text to match or other named patterns. Stripped of code and other semantics the rule for additive_expression looks like this:

public additive_expression returns [Expression value]:
	multiplicative_expression (('+'|'-') multiplicative_expression)*;
...

It reads, declare a public function and rule named additive_expression that has a return property named value of type Expression.

It goes on to define a match rule that says: match using rule multiplicativeexpression, then match a plus or minus and another multiplicativeexpression that can occur zero or many times (the asterisk at the end). Those familiar with Regex will feel right at home.

Looking at the entire additive_expression pattern with code:

public additive_expression returns [Expression value]:
	lex=multiplicative_expression {
		$value = $lex.value;
	}
	(op=('+'|'-') rex=multiplicative_expression
	{
		switch($op.text) 
		{ 
			case "+": $value = ExpressionHelper.BinaryOperator($value, $rex.value, ExpressionType.Add); break;
			case "-": $value = ExpressionHelper.BinaryOperator($value, $rex.value, ExpressionType.Subtract); break;
		};
	}
	)*;

You may place code at any point in the match by using the curly braces. These curly braces define code that will be executed at the current point of the match.

Note that when a named pattern is used several times in a match, you need to alias each instance for use in code in order to resolve ambiguity, here lex and rex perform that function.

Now when we refer to our named patterns in code, we have to prefix with the dollar sign $. Also the actual object generated in code is not the return value declared but an ANTLR object. This object has several properties we are interested in. One is the return property we named in the rule declaration, here we named it value. The return property doesn't have to be named value across all rules. I just use it for consistency. Another property that the ANTLR object has that would be useful is the text property. The code $lex.text for example would return the actual string text matched by $lex.

        // match the rule multiplicative_expression and alias as lex
	lex=multiplicative_expression 
        {
                // just store it as the current return value. In the case that this is NOT a binary expression, it will simply return the Expression generated in the multiplicative_expression rule.
		$value = $lex.value;
	}

        // start a group match
	(  
        .// match the plus or minus sign and store as alias "op". 
        op=('+'|'-') rex=multiplicative_expression

	{
		switch($op.text) 
		{ 
			case "+": $value = ExpressionHelper.BinaryOperator($value, $rex.value, ExpressionType.Add); break;
			case "-": $value = ExpressionHelper.BinaryOperator($value, $rex.value, ExpressionType.Subtract); break;
		};
	}

So how do I add my custom operator?

Under the Conditional Expression Section you will find the binary operators.
To encode operator precedence, expressions are chained, i.e. one expression type calls the next. Operator precedence is encoded in "reverse". multiplicativeexpression is the last one in the binary operator chain and is at the top. It calls unaryexpression for it's left and right expressions.

public multiplicative_expression returns [Expression value]:
	lex=unary_expression {
		$value = $lex.value;
	}
	(op=('*'|'/'|'%') rex=unary_expression
...

Below it, additive expression calls multiplicative expression.

public additive_expression returns [Expression value]:
	lex=multiplicative_expression {
		$value = $lex.value;
	}
	(op=('+'|'-') rex=multiplicative_expression
	{
...

I assumed that your operators followed the equality expression so I inserted it into the chain and updated the previous expression accordingly.

public stringcontains_expression returns [Expression value]:
lex=equality_expression {
	$value = $lex.value;
}
(op=('!~'|'~'|'~~'|'~=') rex=equality_expression
{
	switch($op.text) 
	{ 
		case "!~": $value = CustomExpressions.NotContains($value, $rex.value); break;
		case "~": case "~~": case "~=": $value = CustomExpressions.Contains($value, $rex.value); break;
	};
})*

so before it was andexpression -> equalityexpression and the chain was modified to:

andexpression -> stringcontainsexpression -> equality_expression. We may need to change this as needed.

I guess it means equality will take precedence over contains, which will take precedence over and?

I may need to rethink this, say if you have string1 ~= string2 == true. It would evaluate == between string2 and false first, which would be wrong. You'd have to use parentheses.

Anyway, give me some use cases and we can tweak it.

Last edited May 2, 2014 at 12:21 PM by RupertAvery, version 1

Comments

mwpowellhtx Jul 12, 2015 at 4:03 PM 
So, should be fairly simple to add support for bitwise operators, on ordinals or enums?