SQLite

Check-in [36624d37]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Document the "%token" directive for Lemon. This directive has been in place for a while, but was previously undocumented.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 36624d3740a8d095eee061bcc5037deabddb88a53444ec1a956a8af7684efa43
User & Date: drh 2021-03-28 20:44:01
Context
2021-03-28
23:37
Alternative implementation of the comparison opcode speed-up of check-in [4a8805d9a66dc888] that should pass muster with UBSAN. (check-in: afb18f64 user: drh tags: trunk)
20:44
Document the "%token" directive for Lemon. This directive has been in place for a while, but was previously undocumented. (check-in: 36624d37 user: drh tags: trunk)
2021-03-27
16:21
For the sqlite3_bind_text16 TCL binding used for testing, ensure that there are at least 3 terminating zeros, so that there will always be a \u0000 character even if the original byte sequence is an odd number of bytes. (check-in: c23d092f user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to doc/lemon.html.
693
694
695
696
697
698
699

700
701
702
703
704
705
706
<li><tt><a href='#parse_accept'>%parse_accept</a></tt>
<li><tt><a href='#parse_failure'>%parse_failure</a></tt>
<li><tt><a href='#pright'>%right</a></tt>
<li><tt><a href='#stack_overflow'>%stack_overflow</a></tt>
<li><tt><a href='#stack_size'>%stack_size</a></tt>
<li><tt><a href='#start_symbol'>%start_symbol</a></tt>
<li><tt><a href='#syntax_error'>%syntax_error</a></tt>

<li><tt><a href='#token_class'>%token_class</a></tt>
<li><tt><a href='#token_destructor'>%token_destructor</a></tt>
<li><tt><a href='#token_prefix'>%token_prefix</a></tt>
<li><tt><a href='#token_type'>%token_type</a></tt>
<li><tt><a href='#ptype'>%type</a></tt>
<li><tt><a href='#pwildcard'>%wildcard</a></tt>
</ul>







>







693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
<li><tt><a href='#parse_accept'>%parse_accept</a></tt>
<li><tt><a href='#parse_failure'>%parse_failure</a></tt>
<li><tt><a href='#pright'>%right</a></tt>
<li><tt><a href='#stack_overflow'>%stack_overflow</a></tt>
<li><tt><a href='#stack_size'>%stack_size</a></tt>
<li><tt><a href='#start_symbol'>%start_symbol</a></tt>
<li><tt><a href='#syntax_error'>%syntax_error</a></tt>
<li><tt><a href='#token'>%token</a></tt>
<li><tt><a href='#token_class'>%token_class</a></tt>
<li><tt><a href='#token_destructor'>%token_destructor</a></tt>
<li><tt><a href='#token_prefix'>%token_prefix</a></tt>
<li><tt><a href='#token_type'>%token_type</a></tt>
<li><tt><a href='#ptype'>%type</a></tt>
<li><tt><a href='#pwildcard'>%wildcard</a></tt>
</ul>
1074
1075
1076
1077
1078
1079
1080
1081























1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
   %start_symbol  prog
</pre>

<a id='syntax_error'></a>
<h4>4.4.19 The <tt>%syntax_error</tt> directive</h4>

<p>See <a href='#errors'>Error Processing</a>.</p>
























<a id='token_class'></a>
<h4>4.4.20 The <tt>%token_class</tt> directive</h4>

<p>Undocumented.  Appears to be related to the MULTITERMINAL concept.
<a href='http://sqlite.org/src/fdiff?v1=796930d5fc2036c7&v2=624b24c5dc048e09&sbs=0'>Implementation</a>.</p>

<a id='token_destructor'></a>
<h4>4.4.21 The <tt>%token_destructor</tt> directive</h4>

<p>The <tt>%destructor</tt> directive assigns a destructor to a non-terminal
symbol.  (See the description of the
<tt><a href='%destructor'>%destructor</a></tt> directive above.)
The <tt>%token_destructor</tt> directive does the same thing
for all terminal symbols.</p>

<p>Unlike non-terminal symbols, which may each have a different data type
for their values, terminals all use the same data type (defined by
the <tt><a href='#token_type'>%token_type</a></tt> directive)
and so they use a common destructor.
Other than that, the token destructor works just like the non-terminal
destructors.</p>

<a id='token_prefix'></a>
<h4>4.4.22 The <tt>%token_prefix</tt> directive</h4>

<p>Lemon generates #defines that assign small integer constants
to each terminal symbol in the grammar.  If desired, Lemon will
add a prefix specified by this directive
to each of the #defines it generates.</p>

<p>So if the default output of Lemon looked like this:</p>








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

|





|















|







1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
   %start_symbol  prog
</pre>

<a id='syntax_error'></a>
<h4>4.4.19 The <tt>%syntax_error</tt> directive</h4>

<p>See <a href='#errors'>Error Processing</a>.</p>

<a id='token'></a>
<h4>4.4.20 The <tt>%token</tt> directive</h4>

<p>Tokens are normally created automatically, the first time they are used.
Any identifier that begins with an upper-case letter is a token.

<p>Sometimes it is useful to declare tokens in advance, however.  The
integer values assigned to each token determined by the order in which
the tokens are seen.  So by declaring tokens in advance, it is possible to
cause some tokens to have low-numbered values, which might be desirable in
some grammers, or to have sequential values assigned to a sequence of
related tokens.  For this reason, the %token directive is provided to
declare tokens in advance.  The syntax is as follows:

<blockquote>
<tt>%token</tt> <i>TOKEN</i> <i>TOKEN...</i> <b>.</b>
</blockquote></p>

<p>The %token directive is followed by zero or more token symbols and
terminated by a single ".".  Each token named is created if it does not
already exist.  Tokens are created in order.


<a id='token_class'></a>
<h4>4.4.21 The <tt>%token_class</tt> directive</h4>

<p>Undocumented.  Appears to be related to the MULTITERMINAL concept.
<a href='http://sqlite.org/src/fdiff?v1=796930d5fc2036c7&v2=624b24c5dc048e09&sbs=0'>Implementation</a>.</p>

<a id='token_destructor'></a>
<h4>4.4.22 The <tt>%token_destructor</tt> directive</h4>

<p>The <tt>%destructor</tt> directive assigns a destructor to a non-terminal
symbol.  (See the description of the
<tt><a href='%destructor'>%destructor</a></tt> directive above.)
The <tt>%token_destructor</tt> directive does the same thing
for all terminal symbols.</p>

<p>Unlike non-terminal symbols, which may each have a different data type
for their values, terminals all use the same data type (defined by
the <tt><a href='#token_type'>%token_type</a></tt> directive)
and so they use a common destructor.
Other than that, the token destructor works just like the non-terminal
destructors.</p>

<a id='token_prefix'></a>
<h4>4.4.23 The <tt>%token_prefix</tt> directive</h4>

<p>Lemon generates #defines that assign small integer constants
to each terminal symbol in the grammar.  If desired, Lemon will
add a prefix specified by this directive
to each of the #defines it generates.</p>

<p>So if the default output of Lemon looked like this:</p>
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
    #define TOKEN_AND        1
    #define TOKEN_MINUS      2
    #define TOKEN_OR         3
    #define TOKEN_PLUS       4
</pre>

<a id='token_type'></a><a id='ptype'></a>
<h4>4.4.23 The <tt>%token_type</tt> and <tt>%type</tt> directives</h4>

<p>These directives are used to specify the data types for values
on the parser's stack associated with terminal and non-terminal
symbols.  The values of all terminal symbols must be of the same
type.  This turns out to be the same data type as the 3rd parameter
to the Parse() function generated by Lemon.  Typically, you will
make the value of a terminal symbol be a pointer to some kind of







|







1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
    #define TOKEN_AND        1
    #define TOKEN_MINUS      2
    #define TOKEN_OR         3
    #define TOKEN_PLUS       4
</pre>

<a id='token_type'></a><a id='ptype'></a>
<h4>4.4.24 The <tt>%token_type</tt> and <tt>%type</tt> directives</h4>

<p>These directives are used to specify the data types for values
on the parser's stack associated with terminal and non-terminal
symbols.  The values of all terminal symbols must be of the same
type.  This turns out to be the same data type as the 3rd parameter
to the Parse() function generated by Lemon.  Typically, you will
make the value of a terminal symbol be a pointer to some kind of
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
the grammar designer should keep in mind that the size of the union
will be the size of its largest element.  So if you have a single
non-terminal whose data type requires 1K of storage, then your 100
entry parser stack will require 100K of heap space.  If you are willing
and able to pay that price, fine.  You just need to know.</p>

<a id='pwildcard'></a>
<h4>4.4.24 The <tt>%wildcard</tt> directive</h4>

<p>The <tt>%wildcard</tt> directive is followed by a single token name and a
period.  This directive specifies that the identified token should
match any input token.</p>

<p>When the generated parser has the choice of matching an input against
the wildcard token and some other token, the other token is always used.







|







1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
the grammar designer should keep in mind that the size of the union
will be the size of its largest element.  So if you have a single
non-terminal whose data type requires 1K of storage, then your 100
entry parser stack will require 100K of heap space.  If you are willing
and able to pay that price, fine.  You just need to know.</p>

<a id='pwildcard'></a>
<h4>4.4.25 The <tt>%wildcard</tt> directive</h4>

<p>The <tt>%wildcard</tt> directive is followed by a single token name and a
period.  This directive specifies that the identified token should
match any input token.</p>

<p>When the generated parser has the choice of matching an input against
the wildcard token and some other token, the other token is always used.
Changes to tool/lemon.c.
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
      }
      break;
    case WAITING_FOR_TOKEN_NAME:
      /* Tokens do not have to be declared before use.  But they can be
      ** in order to control their assigned integer number.  The number for
      ** each token is assigned when it is first seen.  So by including
      **
      **     %token ONE TWO THREE
      **
      ** early in the grammar file, that assigns small consecutive values
      ** to each of the tokens ONE TWO and THREE.
      */
      if( x[0]=='.' ){
        psp->state = WAITING_FOR_DECL_OR_RULE;
      }else if( !ISUPPER(x[0]) ){







|







2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
      }
      break;
    case WAITING_FOR_TOKEN_NAME:
      /* Tokens do not have to be declared before use.  But they can be
      ** in order to control their assigned integer number.  The number for
      ** each token is assigned when it is first seen.  So by including
      **
      **     %token ONE TWO THREE.
      **
      ** early in the grammar file, that assigns small consecutive values
      ** to each of the tokens ONE TWO and THREE.
      */
      if( x[0]=='.' ){
        psp->state = WAITING_FOR_DECL_OR_RULE;
      }else if( !ISUPPER(x[0]) ){