Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Experimental changes to Lemon for improved parser performance. |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | parser-performance |
Files: | files | file ages | folders |
SHA1: |
a65d583ce97b8c08157268bd054479cd |
User & Date: | drh 2016-02-16 21:19:49 |
Context
2016-02-17
| ||
01:18 | In Lemon, add the ability for the left-most RHS label to be the same as the LHS label, causing the LHS values to be written directly into the stack. (check-in: 4bb94c7c user: drh tags: parser-performance) | |
2016-02-16
| ||
21:19 | Experimental changes to Lemon for improved parser performance. (check-in: a65d583c user: drh tags: parser-performance) | |
13:04 | Minor simplification to the tokenizer. Slightly smaller and faster. (check-in: 9570b6b4 user: drh tags: trunk) | |
Changes
Changes to doc/lemon.html.
︙ | ︙ | |||
157 158 159 160 161 162 163 | the type of the third argument is integer, but the grammar will usually redefine this type to be some kind of structure. Typically the second argument will be a broad category of tokens such as ``identifier'' or ``number'' and the third argument will be the name of the identifier or the value of the number.</p> <p>The Parse() function may have either three or four arguments, | | > | | 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | the type of the third argument is integer, but the grammar will usually redefine this type to be some kind of structure. Typically the second argument will be a broad category of tokens such as ``identifier'' or ``number'' and the third argument will be the name of the identifier or the value of the number.</p> <p>The Parse() function may have either three or four arguments, depending on the grammar. If the grammar specification file requests it (via the <a href='#extraarg'><tt>extra_argument</tt> directive</a>), the Parse() function will have a fourth parameter that can be of any type chosen by the programmer. The parser doesn't do anything with this argument except to pass it through to action routines. This is a convenient mechanism for passing state information down to the action routines without having to use global variables.</p> <p>A typical use of a Lemon parser might look something like the following: |
︙ | ︙ | |||
258 259 260 261 262 263 264 265 266 267 268 269 270 271 | <li>Lemon allows multiple parsers to be running simultaneously. Yacc and bison do not. </ul> These differences may cause some initial confusion for programmers with prior yacc and bison experience. But after years of experience using Lemon, I firmly believe that the Lemon way of doing things is better.</p> <h2>Input File Syntax</h2> <p>The main purpose of the grammar specification file for Lemon is to define the grammar for the parser. But the input file also specifies additional information Lemon requires to do its job. Most of the work in using Lemon is in writing an appropriate | > > > > > > | 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 | <li>Lemon allows multiple parsers to be running simultaneously. Yacc and bison do not. </ul> These differences may cause some initial confusion for programmers with prior yacc and bison experience. But after years of experience using Lemon, I firmly believe that the Lemon way of doing things is better.</p> <p><i>Updated as of 2016-02-16:</i> The text above was written in the 1990s. We are told that Bison has lately been enhanced to support the tokenizer-calls-parser paradigm used by Lemon, and to obviate the need for global variables.</p> <h2>Input File Syntax</h2> <p>The main purpose of the grammar specification file for Lemon is to define the grammar for the parser. But the input file also specifies additional information Lemon requires to do its job. Most of the work in using Lemon is in writing an appropriate |
︙ | ︙ | |||
613 614 615 616 617 618 619 620 621 622 623 624 625 626 | the destructor is not called in this circumstance.</p> <p>By appropriate use of destructors, it is possible to build a parser using Lemon that can be used within a long-running program, such as a GUI, that will not leak memory or other resources. To do the same using yacc or bison is much more difficult.</p> <h4>The <tt>%extra_argument</tt> directive</h4> The %extra_argument directive instructs Lemon to add a 4th parameter to the parameter list of the Parse() function it generates. Lemon doesn't do anything itself with this extra argument, but it does make the argument available to C-code action routines, destructors, and so forth. For example, if the grammar file contains:</p> | > | 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 | the destructor is not called in this circumstance.</p> <p>By appropriate use of destructors, it is possible to build a parser using Lemon that can be used within a long-running program, such as a GUI, that will not leak memory or other resources. To do the same using yacc or bison is much more difficult.</p> <a name="extraarg"></a> <h4>The <tt>%extra_argument</tt> directive</h4> The %extra_argument directive instructs Lemon to add a 4th parameter to the parameter list of the Parse() function it generates. Lemon doesn't do anything itself with this extra argument, but it does make the argument available to C-code action routines, destructors, and so forth. For example, if the grammar file contains:</p> |
︙ | ︙ |
Changes to ext/fts5/fts5parse.y.
︙ | ︙ | |||
30 31 32 33 34 35 36 | %syntax_error { UNUSED_PARAM(yymajor); /* Silence a compiler warning */ sqlite3Fts5ParseError( pParse, "fts5: syntax error near \"%.*s\"",TOKEN.n,TOKEN.p ); } %stack_overflow { | < | 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | %syntax_error { UNUSED_PARAM(yymajor); /* Silence a compiler warning */ sqlite3Fts5ParseError( pParse, "fts5: syntax error near \"%.*s\"",TOKEN.n,TOKEN.p ); } %stack_overflow { sqlite3Fts5ParseError(pParse, "fts5: parser stack overflow"); } // The name of the generated procedure that implements the parser // is as follows: %name sqlite3Fts5Parser |
︙ | ︙ |
Changes to src/parse.y.
︙ | ︙ | |||
31 32 33 34 35 36 37 | // %syntax_error { UNUSED_PARAMETER(yymajor); /* Silence some compiler warnings */ assert( TOKEN.z[0] ); /* The tokenizer always gives us a token */ sqlite3ErrorMsg(pParse, "near \"%T\": syntax error", &TOKEN); } %stack_overflow { | < | 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | // %syntax_error { UNUSED_PARAMETER(yymajor); /* Silence some compiler warnings */ assert( TOKEN.z[0] ); /* The tokenizer always gives us a token */ sqlite3ErrorMsg(pParse, "near \"%T\": syntax error", &TOKEN); } %stack_overflow { sqlite3ErrorMsg(pParse, "parser stack overflow"); } // The name of the generated procedure that implements the parser // is as follows: %name sqlite3Parser |
︙ | ︙ |
Changes to tool/lempar.c.
︙ | ︙ | |||
514 515 516 517 518 519 520 | #endif return yy_action[i]; } /* ** The following routine is called if the stack overflows. */ | | | 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 | #endif return yy_action[i]; } /* ** The following routine is called if the stack overflows. */ static void yyStackOverflow(yyParser *yypParser){ ParseARG_FETCH; yypParser->yyidx--; #ifndef NDEBUG if( yyTraceFILE ){ fprintf(yyTraceFILE,"%sStack Overflow!\n",yyTracePrompt); } #endif |
︙ | ︙ | |||
558 559 560 561 562 563 564 | /* ** Perform a shift action. */ static void yy_shift( yyParser *yypParser, /* The parser to be shifted */ int yyNewState, /* The new state to shift in */ int yyMajor, /* The major token to shift in */ | | | | | | 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 | /* ** Perform a shift action. */ static void yy_shift( yyParser *yypParser, /* The parser to be shifted */ int yyNewState, /* The new state to shift in */ int yyMajor, /* The major token to shift in */ ParseTOKENTYPE yyMinor /* The minor token to shift in */ ){ yyStackEntry *yytos; yypParser->yyidx++; #ifdef YYTRACKMAXSTACKDEPTH if( yypParser->yyidx>yypParser->yyidxMax ){ yypParser->yyidxMax = yypParser->yyidx; } #endif #if YYSTACKDEPTH>0 if( yypParser->yyidx>=YYSTACKDEPTH ){ yyStackOverflow(yypParser); return; } #else if( yypParser->yyidx>=yypParser->yystksz ){ yyGrowStack(yypParser); if( yypParser->yyidx>=yypParser->yystksz ){ yyStackOverflow(yypParser); return; } } #endif yytos = &yypParser->yystack[yypParser->yyidx]; yytos->stateno = (YYACTIONTYPE)yyNewState; yytos->major = (YYCODETYPE)yyMajor; yytos->minor.yy0 = yyMinor; yyTraceShift(yypParser, yyNewState); } /* The following table contains information about every rule that ** is used during the reduce. */ static const struct { |
︙ | ︙ | |||
623 624 625 626 627 628 629 | if( yyTraceFILE && yyruleno>=0 && yyruleno<(int)(sizeof(yyRuleName)/sizeof(yyRuleName[0])) ){ yysize = yyRuleInfo[yyruleno].nrhs; fprintf(yyTraceFILE, "%sReduce [%s], go to state %d.\n", yyTracePrompt, yyRuleName[yyruleno], yymsp[-yysize].stateno); } #endif /* NDEBUG */ | | > > > > > > > > > > > > > > > > > > > > > > > > > < < < < < < | | | | | | | < < < > | 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 | if( yyTraceFILE && yyruleno>=0 && yyruleno<(int)(sizeof(yyRuleName)/sizeof(yyRuleName[0])) ){ yysize = yyRuleInfo[yyruleno].nrhs; fprintf(yyTraceFILE, "%sReduce [%s], go to state %d.\n", yyTracePrompt, yyRuleName[yyruleno], yymsp[-yysize].stateno); } #endif /* NDEBUG */ /* yygotominor = yyzerominor; */ /* Check that the stack is large enough to grow by a single entry ** if the RHS of the rule is empty. This ensures that there is room ** enough on the stack to push the LHS value */ if( yyRuleInfo[yyruleno].nrhs==0 ){ #ifdef YYTRACKMAXSTACKDEPTH if( yypParser->yyidx>yypParser->yyidxMax ){ yypParser->yyidxMax = yypParser->yyidx; } #endif #if YYSTACKDEPTH>0 if( yypParser->yyidx>=YYSTACKDEPTH ){ yyStackOverflow(yypParser); return; } #else if( yypParser->yyidx>=yypParser->yystksz ){ yyGrowStack(yypParser); if( yypParser->yyidx>=yypParser->yystksz ){ yyStackOverflow(yypParser); return; } } #endif } switch( yyruleno ){ /* Beginning here are the reduction cases. A typical example ** follows: ** case 0: ** #line <lineno> <grammarfile> ** { ... } // User supplied code ** #line <lineno> <thisfile> ** break; */ /********** Begin reduce actions **********************************************/ %% /********** End reduce actions ************************************************/ }; assert( yyruleno>=0 && yyruleno<sizeof(yyRuleInfo)/sizeof(yyRuleInfo[0]) ); yygoto = yyRuleInfo[yyruleno].lhs; yysize = yyRuleInfo[yyruleno].nrhs; yyact = yy_find_reduce_action(yymsp[-yysize].stateno,(YYCODETYPE)yygoto); if( yyact <= YY_MAX_SHIFTREDUCE ){ if( yyact>YY_MAX_SHIFT ) yyact += YY_MIN_REDUCE - YY_MIN_SHIFTREDUCE; yypParser->yyidx -= yysize - 1; yymsp -= yysize-1; yymsp->stateno = (YYACTIONTYPE)yyact; yymsp->major = (YYCODETYPE)yygoto; yymsp->minor = yygotominor; yyTraceShift(yypParser, yyact); }else{ assert( yyact == YY_ACCEPT_ACTION ); yypParser->yyidx -= yysize; yy_accept(yypParser); } } /* ** The following code executes when the parse fails */ |
︙ | ︙ | |||
694 695 696 697 698 699 700 | /* ** The following code executes when a syntax error first occurs. */ static void yy_syntax_error( yyParser *yypParser, /* The parser */ int yymajor, /* The major type of the error token */ | | | | 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 | /* ** The following code executes when a syntax error first occurs. */ static void yy_syntax_error( yyParser *yypParser, /* The parser */ int yymajor, /* The major type of the error token */ ParseTOKENTYPE yyminor /* The minor type of the error token */ ){ ParseARG_FETCH; #define TOKEN yyminor /************ Begin %syntax_error code ****************************************/ %% /************ End %syntax_error code ******************************************/ ParseARG_STORE; /* Suppress warning about unused %extra_argument variable */ } /* |
︙ | ︙ | |||
765 766 767 768 769 770 771 | yyParser *yypParser; /* The parser */ /* (re)initialize the parser, if necessary */ yypParser = (yyParser*)yyp; if( yypParser->yyidx<0 ){ #if YYSTACKDEPTH<=0 if( yypParser->yystksz <=0 ){ | < < | < | > | 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 | yyParser *yypParser; /* The parser */ /* (re)initialize the parser, if necessary */ yypParser = (yyParser*)yyp; if( yypParser->yyidx<0 ){ #if YYSTACKDEPTH<=0 if( yypParser->yystksz <=0 ){ yyStackOverflow(yypParser); return; } #endif yypParser->yyidx = 0; #ifndef YYNOERRORRECOVERY yypParser->yyerrcnt = -1; #endif yypParser->yystack[0].stateno = 0; yypParser->yystack[0].major = 0; #ifndef NDEBUG if( yyTraceFILE ){ fprintf(yyTraceFILE,"%sInitialize. Empty stack. State 0\n", yyTracePrompt); } #endif } #if !defined(YYERRORSYMBOL) && !defined(YYNOERRORRECOVERY) yyendofinput = (yymajor==0); #endif ParseARG_STORE; #ifndef NDEBUG if( yyTraceFILE ){ fprintf(yyTraceFILE,"%sInput '%s'\n",yyTracePrompt,yyTokenName[yymajor]); } #endif do{ yyact = yy_find_shift_action(yypParser,(YYCODETYPE)yymajor); if( yyact <= YY_MAX_SHIFTREDUCE ){ if( yyact > YY_MAX_SHIFT ) yyact += YY_MIN_REDUCE - YY_MIN_SHIFTREDUCE; yy_shift(yypParser,yyact,yymajor,yyminor); #ifndef YYNOERRORRECOVERY yypParser->yyerrcnt--; #endif yymajor = YYNOCODE; }else if( yyact <= YY_MAX_REDUCE ){ yy_reduce(yypParser,yyact-YY_MIN_REDUCE); }else{ assert( yyact == YY_ERROR_ACTION ); yyminorunion.yy0 = yyminor; #ifdef YYERRORSYMBOL int yymx; #endif #ifndef NDEBUG if( yyTraceFILE ){ fprintf(yyTraceFILE,"%sSyntax Error!\n",yyTracePrompt); } |
︙ | ︙ | |||
838 839 840 841 842 843 844 | ** ** * Begin accepting and shifting new tokens. No new error ** processing will occur until three tokens have been ** shifted successfully. ** */ if( yypParser->yyerrcnt<0 ){ | | | | < < | | | | 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 | ** ** * Begin accepting and shifting new tokens. No new error ** processing will occur until three tokens have been ** shifted successfully. ** */ if( yypParser->yyerrcnt<0 ){ yy_syntax_error(yypParser,yymajor,yyminor); } yymx = yypParser->yystack[yypParser->yyidx].major; if( yymx==YYERRORSYMBOL || yyerrorhit ){ #ifndef NDEBUG if( yyTraceFILE ){ fprintf(yyTraceFILE,"%sDiscard input token %s\n", yyTracePrompt,yyTokenName[yymajor]); } #endif yy_destructor(yypParser, (YYCODETYPE)yymajor, &yyminorunion); yymajor = YYNOCODE; }else{ while( yypParser->yyidx >= 0 && yymx != YYERRORSYMBOL && (yyact = yy_find_reduce_action( yypParser->yystack[yypParser->yyidx].stateno, YYERRORSYMBOL)) >= YY_MIN_REDUCE ){ yy_pop_parser_stack(yypParser); } if( yypParser->yyidx < 0 || yymajor==0 ){ yy_destructor(yypParser,(YYCODETYPE)yymajor,&yyminorunion); yy_parse_failed(yypParser); yymajor = YYNOCODE; }else if( yymx!=YYERRORSYMBOL ){ yy_shift(yypParser,yyact,YYERRORSYMBOL,yyminor); } } yypParser->yyerrcnt = 3; yyerrorhit = 1; #elif defined(YYNOERRORRECOVERY) /* If the YYNOERRORRECOVERY macro is defined, then do not attempt to ** do any kind of error recovery. Instead, simply invoke the syntax ** error routine and continue going as if nothing had happened. ** ** Applications can set this macro (for example inside %include) if ** they intend to abandon the parse upon the first syntax error seen. */ yy_syntax_error(yypParser,yymajor, yyminor); yy_destructor(yypParser,(YYCODETYPE)yymajor,&yyminorunion); yymajor = YYNOCODE; #else /* YYERRORSYMBOL is not defined */ /* This is what we do if the grammar does not define ERROR: ** ** * Report an error message, and throw away the input token. ** ** * If the input token is $, then fail the parse. ** ** As before, subsequent error messages are suppressed until ** three input tokens have been successfully shifted. */ if( yypParser->yyerrcnt<=0 ){ yy_syntax_error(yypParser,yymajor, yyminor); } yypParser->yyerrcnt = 3; yy_destructor(yypParser,(YYCODETYPE)yymajor,&yyminorunion); if( yyendofinput ){ yy_parse_failed(yypParser); } yymajor = YYNOCODE; |
︙ | ︙ |