I tried different tools (including ANTLR), but finally the easiest way I found was bison + flex. It's unbelievable that this technology from 1989 is still actively developed. Latest stable release is from May 14, 2011. Moreover, many important projects make use of it. Among them are Ruby, PHP, Google Go, Bash shell.
So I decided to create a minimalistic example, which works from scratch.
I published the code on github Calculator, so you can check it out.
The whole example is 88 lines long and evaluates common expressions, like 2+2*2-13*(7+19/2).
Let's start with lexer. In flex, you need to define regular expressions, which produce tokens. Such tokens are later processed by a scanner. So we have to define calculator.lex, like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%{ | |
#include "calculator.tab.h" | |
using namespace std; | |
%} | |
%% | |
[ \t] ; | |
\+ { return PLUS; } | |
\- { return MINUS; } | |
\* { return MUL; } | |
\/ { return DIV; } | |
\( { return LPAREN; } | |
\) { return RPAREN; } | |
[0-9]+ { yylval.ival = atoi(yytext); return INT; } | |
[a-zA-Z0-9]+ { | |
yylval.sval = strdup(yytext); | |
return IDENT; | |
} | |
\n return ENDL; | |
. ; | |
%% |
Flex will generate yylex() function, which we can call later to produce tokens.
Next, we need to create a scanner (calculator.y), which specifies a grammar. It's simple like that:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%{ | |
#include <cstdio> | |
#include <iostream> | |
using namespace std; | |
int yylex(); | |
extern "C" int yyparse(); | |
extern "C" FILE *yyin; | |
void yyerror(char *s); | |
%} | |
%union { | |
int ival; | |
float fval; | |
char *sval; | |
} | |
%token ENDL PLUS MINUS MUL DIV LPAREN RPAREN | |
%token <ival> INT | |
%token <sval> IDENT | |
%type <ival> expression1 | |
%type <ival> expression | |
%left PLUS MINUS | |
%left MUL DIV | |
%% | |
start: | |
expressions; | |
expressions: | |
expressions expression1 ENDL | |
| expression1 ENDL; | |
expression1: | |
expression { cout<<$1<<endl; }; | |
expression: | |
expression PLUS expression { $$ = $1 + $3; } | |
|expression MINUS expression { $$ = $1 - $3; } | |
|expression MUL expression { $$ = $1 * $3; } | |
|expression DIV expression { $$ = $1 / $3; } | |
|LPAREN expression RPAREN { $$ = $2; } | |
| INT { $$ = $1; }; | |
%% | |
int main(int argc, char *argv[]) { | |
if (argc!= 2) { | |
cout <<"Usage: <command> filename"<<endl; | |
return 1; | |
} | |
FILE *myfile = fopen(argv[1], "r"); | |
if (!myfile) { | |
cout << "I can't open "<<argv[1]<< endl; | |
return -1; | |
} | |
yyin = myfile; | |
do { | |
yyparse(); | |
} while (!feof(yyin)); | |
} | |
void yyerror(char *s) { | |
cout << "Parse error: " << s << endl; | |
exit(-1); | |
} |
Here, we specify types for all tokens using C/C++ union like structure. Variable $$ is used to store result of particular reductions.
Additionally, we need to specify %left precedence for +, -, *, / operators to resolve shift / reduce conflicts between them.
And that's basically it. We have a working expression parser.
I implemented it in a way, that executable takes a file name containing expressions as an argument.
So you can try ./calculator input.txt to see the result.
Nice blog, it's so knowledgeable, informative, and good looking site. I appreciate your hard work. Good job. Thank you for this wonderful sharing with us. Keep Sharing.
ReplyDeleteKindly visit us @ online idea lab