Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Document the "%token" directive for Lemon. This directive has been in place for a while, but was previously undocumented. |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
36624d3740a8d095eee061bcc5037dea |
User & Date: | drh 2021-03-28 20:44:01 |
Context
2021-03-28
| ||
23:37 | Alternative implementation of the comparison opcode speed-up of check-in [4a8805d9a66dc888] that should pass muster with UBSAN. (check-in: afb18f64 user: drh tags: trunk) | |
20:44 | Document the "%token" directive for Lemon. This directive has been in place for a while, but was previously undocumented. (check-in: 36624d37 user: drh tags: trunk) | |
2021-03-27
| ||
16:21 | For the sqlite3_bind_text16 TCL binding used for testing, ensure that there are at least 3 terminating zeros, so that there will always be a \u0000 character even if the original byte sequence is an odd number of bytes. (check-in: c23d092f user: drh tags: trunk) | |
Changes
Changes to doc/lemon.html.
︙ | ︙ | |||
693 694 695 696 697 698 699 700 701 702 703 704 705 706 | <li><tt><a href='#parse_accept'>%parse_accept</a></tt> <li><tt><a href='#parse_failure'>%parse_failure</a></tt> <li><tt><a href='#pright'>%right</a></tt> <li><tt><a href='#stack_overflow'>%stack_overflow</a></tt> <li><tt><a href='#stack_size'>%stack_size</a></tt> <li><tt><a href='#start_symbol'>%start_symbol</a></tt> <li><tt><a href='#syntax_error'>%syntax_error</a></tt> <li><tt><a href='#token_class'>%token_class</a></tt> <li><tt><a href='#token_destructor'>%token_destructor</a></tt> <li><tt><a href='#token_prefix'>%token_prefix</a></tt> <li><tt><a href='#token_type'>%token_type</a></tt> <li><tt><a href='#ptype'>%type</a></tt> <li><tt><a href='#pwildcard'>%wildcard</a></tt> </ul> | > | 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 | <li><tt><a href='#parse_accept'>%parse_accept</a></tt> <li><tt><a href='#parse_failure'>%parse_failure</a></tt> <li><tt><a href='#pright'>%right</a></tt> <li><tt><a href='#stack_overflow'>%stack_overflow</a></tt> <li><tt><a href='#stack_size'>%stack_size</a></tt> <li><tt><a href='#start_symbol'>%start_symbol</a></tt> <li><tt><a href='#syntax_error'>%syntax_error</a></tt> <li><tt><a href='#token'>%token</a></tt> <li><tt><a href='#token_class'>%token_class</a></tt> <li><tt><a href='#token_destructor'>%token_destructor</a></tt> <li><tt><a href='#token_prefix'>%token_prefix</a></tt> <li><tt><a href='#token_type'>%token_type</a></tt> <li><tt><a href='#ptype'>%type</a></tt> <li><tt><a href='#pwildcard'>%wildcard</a></tt> </ul> |
︙ | ︙ | |||
1074 1075 1076 1077 1078 1079 1080 1081 1082 | %start_symbol prog </pre> <a id='syntax_error'></a> <h4>4.4.19 The <tt>%syntax_error</tt> directive</h4> <p>See <a href='#errors'>Error Processing</a>.</p> <a id='token_class'></a> | > > > > > > > > > > > > > > > > > > > > > > > | | | | 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 | %start_symbol prog </pre> <a id='syntax_error'></a> <h4>4.4.19 The <tt>%syntax_error</tt> directive</h4> <p>See <a href='#errors'>Error Processing</a>.</p> <a id='token'></a> <h4>4.4.20 The <tt>%token</tt> directive</h4> <p>Tokens are normally created automatically, the first time they are used. Any identifier that begins with an upper-case letter is a token. <p>Sometimes it is useful to declare tokens in advance, however. The integer values assigned to each token determined by the order in which the tokens are seen. So by declaring tokens in advance, it is possible to cause some tokens to have low-numbered values, which might be desirable in some grammers, or to have sequential values assigned to a sequence of related tokens. For this reason, the %token directive is provided to declare tokens in advance. The syntax is as follows: <blockquote> <tt>%token</tt> <i>TOKEN</i> <i>TOKEN...</i> <b>.</b> </blockquote></p> <p>The %token directive is followed by zero or more token symbols and terminated by a single ".". Each token named is created if it does not already exist. Tokens are created in order. <a id='token_class'></a> <h4>4.4.21 The <tt>%token_class</tt> directive</h4> <p>Undocumented. Appears to be related to the MULTITERMINAL concept. <a href='http://sqlite.org/src/fdiff?v1=796930d5fc2036c7&v2=624b24c5dc048e09&sbs=0'>Implementation</a>.</p> <a id='token_destructor'></a> <h4>4.4.22 The <tt>%token_destructor</tt> directive</h4> <p>The <tt>%destructor</tt> directive assigns a destructor to a non-terminal symbol. (See the description of the <tt><a href='%destructor'>%destructor</a></tt> directive above.) The <tt>%token_destructor</tt> directive does the same thing for all terminal symbols.</p> <p>Unlike non-terminal symbols, which may each have a different data type for their values, terminals all use the same data type (defined by the <tt><a href='#token_type'>%token_type</a></tt> directive) and so they use a common destructor. Other than that, the token destructor works just like the non-terminal destructors.</p> <a id='token_prefix'></a> <h4>4.4.23 The <tt>%token_prefix</tt> directive</h4> <p>Lemon generates #defines that assign small integer constants to each terminal symbol in the grammar. If desired, Lemon will add a prefix specified by this directive to each of the #defines it generates.</p> <p>So if the default output of Lemon looked like this:</p> |
︙ | ︙ | |||
1125 1126 1127 1128 1129 1130 1131 | #define TOKEN_AND 1 #define TOKEN_MINUS 2 #define TOKEN_OR 3 #define TOKEN_PLUS 4 </pre> <a id='token_type'></a><a id='ptype'></a> | | | 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 | #define TOKEN_AND 1 #define TOKEN_MINUS 2 #define TOKEN_OR 3 #define TOKEN_PLUS 4 </pre> <a id='token_type'></a><a id='ptype'></a> <h4>4.4.24 The <tt>%token_type</tt> and <tt>%type</tt> directives</h4> <p>These directives are used to specify the data types for values on the parser's stack associated with terminal and non-terminal symbols. The values of all terminal symbols must be of the same type. This turns out to be the same data type as the 3rd parameter to the Parse() function generated by Lemon. Typically, you will make the value of a terminal symbol be a pointer to some kind of |
︙ | ︙ | |||
1162 1163 1164 1165 1166 1167 1168 | the grammar designer should keep in mind that the size of the union will be the size of its largest element. So if you have a single non-terminal whose data type requires 1K of storage, then your 100 entry parser stack will require 100K of heap space. If you are willing and able to pay that price, fine. You just need to know.</p> <a id='pwildcard'></a> | | | 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 | the grammar designer should keep in mind that the size of the union will be the size of its largest element. So if you have a single non-terminal whose data type requires 1K of storage, then your 100 entry parser stack will require 100K of heap space. If you are willing and able to pay that price, fine. You just need to know.</p> <a id='pwildcard'></a> <h4>4.4.25 The <tt>%wildcard</tt> directive</h4> <p>The <tt>%wildcard</tt> directive is followed by a single token name and a period. This directive specifies that the identified token should match any input token.</p> <p>When the generated parser has the choice of matching an input against the wildcard token and some other token, the other token is always used. |
︙ | ︙ |
Changes to tool/lemon.c.
︙ | ︙ | |||
2704 2705 2706 2707 2708 2709 2710 | } break; case WAITING_FOR_TOKEN_NAME: /* Tokens do not have to be declared before use. But they can be ** in order to control their assigned integer number. The number for ** each token is assigned when it is first seen. So by including ** | | | 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 | } break; case WAITING_FOR_TOKEN_NAME: /* Tokens do not have to be declared before use. But they can be ** in order to control their assigned integer number. The number for ** each token is assigned when it is first seen. So by including ** ** %token ONE TWO THREE. ** ** early in the grammar file, that assigns small consecutive values ** to each of the tokens ONE TWO and THREE. */ if( x[0]=='.' ){ psp->state = WAITING_FOR_DECL_OR_RULE; }else if( !ISUPPER(x[0]) ){ |
︙ | ︙ |