SQLite Forum

lemon - optional namespace support
Login

lemon - optional namespace support

(1) By anonymous on 2020-04-29 13:34:43 [source]

I wanted to follow up on a previous posting with some patches lemon.c.patch and lempar.c.patch for helping integrate lemon parsers into larger C++ code bases. The patches are not overly large or complicated, but most certainly help with avoiding symbol clashes and promote good code encapsulation. We've successfully used lemon in OpenFOAM for handling parsed expressions, to manipulate much larger computational fields. Code snippets show part of embedding lemon with a variety of m4-macros to arguably improve maintenance.

As stated in the first post:

I hope that the changes are useful enough for broader interest and/or sufficiently encapsulated that they could be incorporated into the sources.

If the proposed patches need reworking, I would be happy to comply.

Below is some more of the original posting, with the hope that it will garner some interest this time.

Cheers,

/mark (contact information visible on the corresponding commits)

I've fairly recently been using lemon for building several parsers in C++ and found what I believe to be a minimalist means of avoiding symbol clashes without adding bloat, or affecting C code generation.

New -e command line option to define the code extension. By default this is c, but can be used to define something like -ecxx etc to trigger the correct make rules for a C++ compiler.

New (optional) %namespace directive. This can be used to embed the Parse* routines into a C++ namespace. If the %namespace is not defined, there is no change in behaviour. The namespace can be anonymous or contain multiple nested namespaces. For example,

    %namespace {}
    %namespace {ns1::ns2::ns3}
This makes it easy to generate lemon parsers for C++ without any potential symbol clashes, and it imposes no C++ interface on the user. It does not fundamentally change how lemon works.

(2) By ddevienne on 2020-04-29 15:06:01 in reply to 1 [link] [source]

Hi. Why do you open and close the NS several times in lempar?
I can't easily see the code in between, but what prevents you
from putting the whole thing in a namespace?

Aside from that, your patches appear very reasonable, from a quick
glance. Although that doesn't do any good, sorry :) --DD

(3) By anonymous on 2020-04-29 18:32:30 in reply to 2 [link] [source]

True, that isn't obvious from the patches, but fits with the lemon structure. The lempar.c code is written quite intelligently, so it has a number of file-static code and data bits. For example,

static void yy_pop_parser_stack(yyParser *pParser){ ... }
Followed by externally visible functions bits where it is used. For example,
%namespace_begin
void ParseFinalize(void *p){ ... }
%namespace_end
I personally think this is a very reasonable way to do things. Declare file-statics just before they required.

To nicely enclose in a single namespace would entail juggling most of the file-static bits together near the top, and the namespace-scoped near the bottom. This would be a fairly heavy patch, with the usual potentials for introducing bugs. Simply wrapping everything as is currently is means that some of the names can potentially leak out into the enclosing namespace, which isn't really desirable. I also ran into issues with addressing pieces not in my own scope etc (ie, headaches).

Hope that clarifies. /mark

(4) By ddevienne on 2020-04-30 09:14:22 in reply to 3 [link] [source]

Makes sense. Thanks for the explanation.
Now you just have to see if Richard bites.

(5) By anonymous on 2020-06-26 10:19:57 in reply to 4 [link] [source]

Any hints for how to get Richard's attention enough and to possibly convince him?

The more times that I have need make small extensions to my grammars and recompile with lemon (even cross-compiling), the more thankful I am to have made the choice of using lemon. It is almost scary that it works so well.

(6) By Warren Young (wyoung) on 2020-06-26 15:11:56 in reply to 5 [link] [source]

Any hints for how to get Richard's attention enough and to possibly convince him?

SQLite isn't published under a viral software license, so a person cannot just drop a patch on a public site, if their goal is to get the patch into SQLite. drh couldn't legally apply your patch even if he wanted to. The author of that patch has full copyright protections in that patch in virtually every country in the world until it is explicitly licensed in a way that allows it to be copied.

In the case of SQLite, that means drh needs the legal right to re-distribute your change under the same terms as SQLite: public domain. This means you must explicitly relinquish your legal rights to that patch. That link also goes into more detail about your very question.

You can try sending a signed copyright release to HWACI's postal address. That would at least allow drh to accept the patch as-is.

However, it is also the case that this patch doesn't provide a feature SQLite needs, so you'd have to justify asking drh to carry the maintenance load for it.

What's been the cost to your project to carry the patch? How often do you have to update it to track upstream changes?

If it almost always applies cleanly, then I don't see why you need to upstream it. That's FOSS working like it should.

(7) By anonymous on 2020-06-29 19:19:03 in reply to 6 [link] [source]

The additional maintenance load for me to carry the patch is not the point - I'm not sure what I said that gave that impression. I also do understand the point about not simply dumping code on someone else to maintain.

The motivation was about sharing something that I thought would have been generally useful, which doesn't add a lot of clutter and which would avoid others duplicating the effort. Since lemon is pure C, the integration of lemon into a C++ code has obviously never been a major concern.

Unfortunately, when I read lemon link I didn't interpret the statement "Lemon is maintained as part of the SQLite project" to mean that lemon would only have features that are also needed by SQLite, or that offering potential changes would have provoked this of reaction. Sorry for any offence that I may have caused.

(8) By Richard Hipp (drh) on 2020-06-29 20:07:47 in reply to 7 [link] [source]

I'm not offended.

Here are some thoughts:

  1. You have not successfully argued that the benefits of adding namespaces to Lemon outweigh the costs.

  2. The extra maintenance load on Lemon for adding namespaces, while not significant, is also non-zero. Long-term maintenance is a cost. We need to understand that this benefits outweigh this cost.

  3. Others have argued that namespaces are not really needed, as the identifiers generated by Lemon can be isolated by other means. (I forget the details of how that was suppose to be done, but I seem to recall reading it someplace.) This tends to reduce the benefit side of the inequality.

  4. The people (mostly me) who will need to maintain Lemon namespaces for the next 30 years do not derive any direct benefit from namespaces, and it fact do not understand how they might be useful. I tend to be unimpressed with "new and shiny" and like to see an actual real-world problem that the feature genuinely solves before adding it.

If you would like to see namespaces in Lemon, you will need to convince me that the cost and bother of supporting a feature that I do not use, for 30 years, is less than the societal benefit of installing that feature. In other words, you need to convince me that namespace support in Lemon is a net positive over a 30-year timespan. That has not yet been accomplished.

(13) By anonymous on 2020-07-07 12:13:22 in reply to 8 [link] [source]

Hi Richard - nice to read your response and willingness to discuss.

I'm not sure that I will immediately convince you (or indeed anyone) of the special problems associated with incorporating lemon into C++, but I will kickoff with what I hope is less controversial:

  • A new -e command line option to define the code extension.

By default this is c, but with this option can define something like -ecxx or even something like -efoobar. Being able to specify a different extension (other than .c) provides a means of invoking a different compiler (eg, C++) or triggering a special-purpose make rule that does some additional work.

The total size of the patch needed and the future maintenance are both modest (IMO) and improve versatility in the same way that the -d option does (very useful if you need it, low overhead/maintenance if you do not).

I figured on -e for extension, but it could be any other available letter.

--- lemon.c.orig        2020-01-10 14:08:19.181711107 +0100
+++ lemon.c     2020-01-10 14:12:00.621402133 +0100
@@ -1559,6 +1560,23 @@
   lemon_strcpy(outputDir, z);
 }
 
+/* Remember the name of the code extension (automatically prefix '.')
+*/
+static char *code_ext = NULL;
+static void handle_e_option(char *z){
+  code_ext = (char *) malloc( lemonStrlen(z)+2 );
+  if( code_ext==0 ){
+    fprintf(stderr,"out of memory\n");
+    exit(1);
+  }
+  if(*z == '.'){
+    lemon_strcpy(code_ext, z);
+  } else {
+    code_ext[0] = '.';
+    lemon_strcpy(&(code_ext[1]), z);
+  }
+}
+
 static char *user_templatename = NULL;
 static void handle_T_option(char *z){
   user_templatename = (char *) malloc( lemonStrlen(z)+1 );
@@ -1647,6 +1665,7 @@
     {OPT_FLAG, "c", (char*)&compress, "Don't compress the action table."},
     {OPT_FSTR, "d", (char*)&handle_d_option, "Output directory.  Default '.'"},
     {OPT_FSTR, "D", (char*)handle_D_option, "Define an %ifdef macro."},
+    {OPT_FSTR, "e", (char*)&handle_e_option, "Output code extension.  Default 'c'"},
     {OPT_FSTR, "f", 0, "Ignored.  (Placeholder for -f compiler options.)"},
     {OPT_FLAG, "g", (char*)&rpflag, "Print grammar without actions."},
     {OPT_FSTR, "I", 0, "Ignored.  (Placeholder for '-I' compiler options.)"},
@@ -4187,7 +4272,8 @@
 
   in = tplt_open(lemp);
   if( in==0 ) return;
-  out = file_open(lemp,".c","wb");
+  if(code_ext) out = file_open(lemp,code_ext,"wb");
+  else out = file_open(lemp,".c","wb");
   if( out==0 ){
     fclose(in);
     return;

Notes

The supplied extension can be with/without the dot. Can save a couple (2) of lines of code if we simply dictate that it is always without the dot.

Since we don't both with free() on any of the allocated option handlers, we could reduce even further, if that is desirable:

--- lemon.c.orig        2020-01-10 14:08:19.181711107 +0100
+++ lemon.c     2020-01-10 14:12:00.621402133 +0100
@@ -1559,6 +1560,23 @@
   lemon_strcpy(outputDir, z);
 }
 
+/* Remember the name of the code extension (automatically prefix '.')
+*/
+static char *code_ext = ".c";
+static void handle_e_option(char *z){
+  code_ext = (char *) malloc( lemonStrlen(z)+2 );
+  if( code_ext==0 ){
+    fprintf(stderr,"out of memory\n");
+    exit(1);
+  }
+  code_ext[0] = '.';
+  lemon_strcpy(&(code_ext[1]), z);
+}
+
 static char *user_templatename = NULL;
 static void handle_T_option(char *z){
   user_templatename = (char *) malloc( lemonStrlen(z)+1 );
@@ -1647,6 +1665,7 @@
     {OPT_FLAG, "c", (char*)&compress, "Don't compress the action table."},
     {OPT_FSTR, "d", (char*)&handle_d_option, "Output directory.  Default '.'"},
     {OPT_FSTR, "D", (char*)handle_D_option, "Define an %ifdef macro."},
+    {OPT_FSTR, "e", (char*)&handle_e_option, "Output code extension.  Default 'c'"},
     {OPT_FSTR, "f", 0, "Ignored.  (Placeholder for -f compiler options.)"},
     {OPT_FLAG, "g", (char*)&rpflag, "Print grammar without actions."},
     {OPT_FSTR, "I", 0, "Ignored.  (Placeholder for '-I' compiler options.)"},
@@ -4187,7 +4272,8 @@
 
   in = tplt_open(lemp);
   if( in==0 ) return;
-  out = file_open(lemp,".c","wb");
+  out = file_open(lemp,code_ext,"wb");
   if( out==0 ){
     fclose(in);
     return;

(16) By anonymous on 2020-07-07 14:15:07 in reply to 8 [link] [source]

Hi Richard,

As promised, the second followup.

The main issue that I've tried to solve was the use of global function names. Even if I use the %name directive to mangle them, there is still the potential of accidentally stepping on the names from other parser, which means that we need to have some_long_maybe_unique_ prefix. Beyond that, these Lemon routines should never be touched directly by a user once embedded in one or more libraries (in this case paranoid, not pedantic). I also really wanted to avoid exposing more of the Lemon interface than necessary, for fear of entering one of those header interdependencies like flex/bison/flex etc.

In the C++ world, these problems are usually solved with namespaces. If we solve the problem confined to a C domain (notwithstanding the -e option), we arrive at a simple solution that works equally well for me and should be good for you in terms of cost/benefit, and benefit others wanting to wrap lemon parsers into large libraries.

If we drop all of my fancy handling of begin/end namespace with anonymous or nesting and simply allow specification of the linkage type, the patched version of lempar.c becomes quite small.

--- lempar.c.orig       2020-07-07 15:27:12.506352634 +0200
+++ lempar.c    2020-07-07 15:37:58.957932854 +0200
@@ -88,6 +88,10 @@
 #ifndef INTERFACE
 # define INTERFACE 1
 #endif
+/* Default is global linkage. Alternative is 'static' */
+#ifndef LEMON_LINKAGE
+# define LEMON_LINKAGE
+#endif
 /************* Begin control #defines *****************************************/
 %%
 /************* End control #defines *******************************************/
@@ -251,6 +255,7 @@
 ** Outputs:
 ** None.
 */
+LEMON_LINKAGE
 void ParseTrace(FILE *TraceFILE, char *zTracePrompt){
   yyTraceFILE = TraceFILE;
   yyTracePrompt = zTracePrompt;
@@ -320,6 +325,7 @@
 
 /* Initialize a new parser that has already been allocated.
 */
+LEMON_LINKAGE
 void ParseInit(void *yypRawParser ParseCTX_PDECL){
   yyParser *yypParser = (yyParser*)yypRawParser;
   ParseCTX_STORE
@@ -359,6 +365,7 @@
 ** A pointer to a parser.  This pointer is used in subsequent calls
 ** to Parse and ParseFree.
 */
+LEMON_LINKAGE
 void *ParseAlloc(void *(*mallocProc)(YYMALLOCARGTYPE) ParseCTX_PDECL){
   yyParser *yypParser;
   yypParser = (yyParser*)(*mallocProc)( (YYMALLOCARGTYPE)sizeof(yyParser) );
@@ -427,6 +434,7 @@
 /*
 ** Clear all secondary memory allocations from the parser
 */
+LEMON_LINKAGE
 void ParseFinalize(void *p){
   yyParser *pParser = (yyParser*)p;
   while( pParser->yytos>pParser->yystack ) yy_pop_parser_stack(pParser);
@@ -444,6 +452,7 @@
 ** is defined in a %include section of the input grammar) then it is
 ** assumed that the input pointer is never NULL.
 */
+LEMON_LINKAGE
 void ParseFree(
   void *p,                    /* The parser to be deleted */
   void (*freeProc)(void*)     /* Function used to reclaim memory */
@@ -460,6 +469,7 @@
 ** Return the peak depth of the stack for a parser.
 */
 #ifdef YYTRACKMAXSTACKDEPTH
+LEMON_LINKAGE
 int ParseStackPeak(void *p){
   yyParser *pParser = (yyParser*)p;
   return pParser->yyhwm;
@@ -484,6 +494,7 @@
 ** Return the number of missed state/lookahead combinations.
 */
 #if defined(YYCOVERAGE)
+LEMON_LINKAGE
 int ParseCoverage(FILE *out){
   int stateno, iLookAhead, i;
   int nMissed = 0;
@@ -891,6 +902,7 @@
 ** Outputs:
 ** None.
 */
+LEMON_LINKAGE
 void Parse(
   void *yyp,                   /* The parser */
   int yymajor,                 /* The major token code number */
@@ -1065,6 +1077,7 @@
 ** Return the fallback token corresponding to canonical token iToken, or
 ** 0 if iToken has no fallback.
 */
+LEMON_LINKAGE
 int ParseFallback(int iToken){
 #ifdef YYFALLBACK
   assert( iToken<(int)(sizeof(yyFallback)/sizeof(yyFallback[0])) );

If this seems reasonable, then the only nice thing to change afterwards could be a Lemon directive (eg, %static?) to emit #define LEMON_LINKAGE static from lemon.c to restrict the type of things that people could write and reduce the number of pre-processor macros to document or remember.

For the C++ people, I think this is also a pretty good solution. In the end, I have only ever used an anonymous namespace for localising the lemon functions and that is very much what this is. Except that it works for C++ and C and it doesn't carve as deeply into lemon (even with an additional directive).

(9.2) By Larry Brasfield (LarryBrasfield) on 2020-06-29 20:29:26 edited from 9.1 in reply to 7 [link] [source]

I don't think any sorrow is called for. And the fact that Lemon was written and at times improved to suit the needs or convenience of SQLite project development does not imply that no features making it more generally useful would ever be welcome.

Getting globally visible names into namespaces is quite valuable, especially for components likely to be made part of large projects. The SQLite library upholds this virtue with its ubiquitous SQLITE_ or sqlite3_ prefixes on all such names, so it is clear that the project's owner recognizes the virtue.

I looked at your changes, and incorporated them into a private branch. I remember they were quite extensive, although well done and unlikely to create significant problems going forward. However, I wonder if the same benefit could not be achieved by suitably wrapping declarations and definitions into C++ namespace delineation constructs. The C++ semantics for the meaning of non-scoped names makes this easy to do, non-invasively as I recall. It seems to me that emitting namespace creation/reference wrappers and #include'ing or textually copying the C++-ignorant code, (perhaps with slight changes to make it acceptable to C++ compilers [a]), would present a much lower prospective or actual maintenance burden and attain the same objective for C++ projects.

[a. I refer to some C++ requirements which would have made C a better tool from the beginning without giving it all the C++ features programmers have learned to love and/or dread. ]

Another alteration that seemed possibly more transparent at the working-code level is to simply make the choice between C-style pseudo-namespace names or C++-style genuine-namespace names one which is not visible in the code except by appearance of preprocessor macros. After all, either MyNamespace:: or MYNAMESPACE_ can be token-pasted onto the beginning of names with quite similar effect.

This is not quite a toss from the peanut gallery as I have performed the suggested namespace construct wrapping on a library which badly needed it because it did not play well with others. The prefix pasting idea is admittedly untested.

(15) By anonymous on 2020-07-07 13:46:11 in reply to 9.2 [link] [source]

Hi Larry,

After all, either MyNamespace:: or MYNAMESPACE_ can be token-pasted onto the beginning of names with quite similar effect.

...

The prefix pasting idea is admittedly untested.

Token pasting of names works partly but with some issues.

  • Cannot have %name my::name:: without upsetting Lemon itself.

  • Defining my::name::ParseFree directly will fail since it would need to be declared first within a `namespace my { namespace name { ... } }' before it can be define. Defining it directly will not compile.

In the meantime, I have looked over another response for Richard.

(17) By Larry Brasfield (LarryBrasfield) on 2020-07-07 15:28:14 in reply to 15 [link] [source]

Token pasting of names works partly but with some issues.

  • Cannot have %name my::name:: without upsetting Lemon itself.

That is surely remedied with a tiny change, perhaps even a code reduction.

  • Defining my::name::ParseFree directly will fail since it would need to be declared first within a `namespace my { namespace name { ... } }' before it can be define. Defining it directly will not compile.

True, although that's where the wrapping I suggested comes into play. If, for a given compilation unit, it is all wrapped as '...' is above, except for the external dependencies, (such as #include <stdio.h>), then the declarations and subsequent definitions are in the same namespace. And, I dare say, the wrapper injection needs to appear in just a few places.

In the meantime, I have looked over another response for Richard.

I wish you good luck with that. His LALR parser is a joy to use (and read rules written for) compared to yacc or bison. I once adapted bison for C++ use, and it appears that your changes are less invasive for Lemon than would likely be the case for those other venerable tools. Please appreciate that my suggestions are not intended as "a better way" from a pure programming perspective. Rather, I had hoped that an even less extensive set of changes would enhance the prospects for Lemon becoming a more general purpose tool without first splitting from the SQLite project.

(18) By anonymous on 2020-07-07 17:44:53 in reply to 17 [link] [source]

His LALR parser is a joy to use (and read rules written for) compared to yacc or bison.

100% agreed!

Please appreciate that my suggestions are not intended as "a better way" from a pure programming perspective. Rather, I had hoped that an even less extensive set of changes would enhance the prospects for Lemon becoming a more general purpose tool without first splitting from the SQLite project.

Yes, that was exactly what I was originally aiming for. It took another nudge from Richard for me to realise that since I was only ever using an anonymous namespace anyhow, we could just as well just support static for internal linkage and the job is already done.

I am personally not a fan of the various C++ wrapping for these types of tools anyhow. They are either awkward, out-of-sync or don't match anything close to my usage. With Lemon, I found that I can pack all of my heavy data into my own classes as %extra_context, define my own %token_type and then the rest of handling Lemon in my class is trivial (max 10 lines of wrapping).

(10) By Warren Young (wyoung) on 2020-06-29 21:11:32 in reply to 7 [link] [source]

The additional maintenance load for me to carry the patch is not the point - I'm not sure what I said that gave that impression.

That's the standard reason to upstream a patch.

Part of the reason I assumed you didn't want to carry the maintenance burden any more is that you've repeated your wish for SQLite to adopt this change. How is that not saying, "We don't want to carry this patch any more"? (Rhetorical. Just explaining my thought process.)

offering potential changes would have provoked this of reaction

Since you're replying to me, I've got to say that you sound hurt and offended yourself, but I've re-read my reply, and I don't see how you get that.

My reply can be distilled into two major points, purely factual:

  1. The IP organization behind SQLite can't legally accept it even if they thought it was the most awesome feature in the history of software development.

  2. Even with the legal issues aside, drh has to want to accept it, which he covers well in his reply to you.

(11) By ddevienne on 2020-06-29 22:18:18 in reply to 10 [link] [source]

The IP organization behind SQLite can't legally accept it

You seem to imply lemon.c is legally part of SQLite.

But given how it's nowhere to be found in the amalgamation,
that's not exactly obvious IMHO. I seem to recall DRH accepting
fixes to Lemon in the past. This may be a little different, since
it's a new feature, not a fix, still Lemon may have more leeway
for external contributions, than SQLite proper.

In any case, DRH is not convienced yet. I do believe better support
for C++ in Lemon-generated code is a good idea, and namespaces are
an important tool in C++ (not a new one). Perhaps there are other work
arounds as Richard suggest, but then don't seem readily available, while
direct support in the code makes it obvious how to gain that support.

Is the initial impetus for the patch avoiding global symbols?
Or to support several grammars in the same library?
Else?

Perhaps better stating of goals might sway Richard. FWIW.

(14) By anonymous on 2020-07-07 12:39:59 in reply to 10 [link] [source]

Since you're replying to me, I've got to say that you sound hurt and offended yourself, but I've re-read my reply, and I don't see how you get that.

I reckon I read too much into your initial response (or some phrases have different regional connotations) - pardon.

(12) By Domingo (mingodad) on 2020-07-02 18:25:48 in reply to 1 [link] [source]

Just in case anyone is interested I did some refactoring on lemon and replaced all globals to allow create a library with it.

You can get it here https://github.com/mingodad/ljs/tree/master/lua2ljs .