Thoughts on replacing macros with static inline functions?

(1.1) By PappasBrent on 2022-11-14 16:08:32 edited from 1.0 [link] [source]

Hi all,

I've noticed that SQLite code sometimes uses macros where a static inline function would appear to work equally as well. For instance, consider the macro ISLOWER defined in lemon.c:

#define ISLOWER(X) islower((unsigned char)(X))

I imagine this could be turned into a function like so:

int ISLOWER(char X) { return islower((unsigned char)(X)); }

The reason why one would want to do this is because functions are often easier to reason about than macros. This is because functions have static scoping, only evaluate their arguments once, and are typed. So it may be worthwhile to turn such macros into functions to prevent developers from accidentally falling into common macro pitfalls.

I'm curious, would the SQLite developers consider porting such macros to inline functions in the future?

(2) By Gunter Hick (gunter_hick) on 2022-11-14 15:42:05 in reply to 1.0 [source]

No, not really.

The macro ISLOWER is calling the function islower() with its argument typecast to unsigned char.

Your proposed code accepts a char, casts it to unsigned char, and returns that as an int, without ever checking if the argument corresponds to a lowercase character.

(3) By PappasBrent on 2022-11-14 16:08:46 in reply to 2 [link] [source]

Oh good catch! I accidentally forgot to call islower in the second code snippet. I've updated my original post to correct this error.

Besides that though, may I ask why there's little interest in replacing macros with functions? Does it not seem worth the effort, or is there some other reason? I'm genuinely curious.

(4) By Richard Hipp (drh) on 2022-11-14 16:19:16 in reply to 3 [link] [source]

What would such a change accomplish? How does it move us forward?

(10) By anonymous on 2023-03-02 16:03:11 in reply to 4 [link] [source]

"What would such a change accomplish?"

I can answer this for myself at least:

TL;DR it allowed me to scale sqlite query complexity significantly which increased sqlite's utility and applicability to new problems.

I have long used a pre-processor mechanism I wrote/re-wrote to make my sqlite3 life easier.  I am generally using sqlite3 embedded into other applications which generate sqlite queries via C/C++ or other language programmatically (usually based on a pattern query who's components are chosen/replaced at run-time).  

Often with large stand-alone DBs (1-100GB+) so indexing and query structure is critical for performance and run-time memory foot-print control.  Often times overcoming the limitations of sqlite's query optimizer required peculiar query structures (trading deep joins for complex sub-selects for example, or collapsing a macro argument to a value outside the query to prevent re-execution of a sub-select billions of times).

It's been a lex/yacc, flex/bison and now flex/lemon text pre-processor integrated into shell.c as well as wrapped around sqlite3_prepare_v2() in my applications.  

The basic outline of syntax is:
set name(arg,list) = *; -- (args) may be present or not, used or not by the definition.
clear name; -- delete the variable/macro in a shell session | .spclear for all

It operates similarly to CPP in that it can be a coherent expression/function or it can be a brute-force text/fragment replacement.

(oversimplified examples):
set a=42; -- simple readability variable
select foo from bar where condition=a;

set transform(a,b,c) =(cast=(a as int) + cast(b as int) - cast(c as int));
select transform(1,2,3) as t from foo;

Where this gets very useful is when queries get quite complex:
- multi-way joins with NULL tolerance/manipulation
- extensive casting/error checking (case statements)
- complex sub-selects that make the query otherwise unreadable
- case statements generally make complex queries unreadable just from the sheer depth of context one must track as you read - vs the hierarchical nature of a macro instance

ex:
set complex_decision1(a,b) = (select foo from p join q on a in(select ...;
(replace ... with complexity of your choice)

select
     ... -- elide for brevity
where 
     complex_decision1(var1,var2) -- macro based choice based on whatever
     AND
     complex_decision2(var3)

In short, this capability allowed me to scale up sqlite query complexity radically while still being able to read either other human code or programmatically generated readable query code using this macro idiom.

I added a shell flag to show the expanded query in final expanded form for debug.  I preserved formatting and comments to maximize the chances the error print would show the error in the context of the abstract/pre-expansion code to help you find it.

I realize and have experienced the many errors one can introduce via this mechanism, but they were an easy trade-off to trying to read/write queries who's "select" arguments alone could be a 1000's of characters one all casting and sub-selects were factored in.

Whether or not the macros persist in the shell or are stored in the DB itself was never a concern for me as I was embedding sqlite in other applications.

I also realize there are 1/2 dozen other ways to do this... this one has repeatedly offered me the most maintainable and rapid solution.  So, having a similarly concise way to manipulate variables/macros would be a very welcome addition to sqlite for me.

(11) By ddevienne on 2023-03-02 16:38:07 in reply to 10 [link] [source]

Interesting. Thanks for sharing.

Does it nest? i.e. can macros use other macros?
Is it purely textual? I.e. can macros be as silly as "from t where"? As opposed to being correct clauses or expressions by themselves?
Why use a syntax that makes it difficult to distinguish from regular SQL function call syntax?
E.g. in Apache-Ant I lobbied for macro-expansion to use @name to distinguish from $name, property-expansion.

(12) By anonymous on 2023-03-02 17:25:57 in reply to 11 [link] [source]

- nesting - yes... there's an arbitrary 100 limit on recursion depth through the token expansion process to prevent run-away... it could/should be more elegant than that, but.

- textual? - yes, token based using flex/lemon to very crudely sort out tokens without getting mired in full SQL syntax parsing (for risk of fouling up the main keyword commands (select, insert, create, etc...).  The right-hand-side of the token can be an arbitrary code fragment.  The only restriction at the moment is that it can't have mis-matching parens owing to the basic lemon expression rule of expr ::= LP expr RP. being core to it's pattern matching in my .y file.

- why use a syntax that is ambiguous with SQL? Mostly expediency... were I trying to generalize this to inclusion with sqlite, I might re-think that as sqlite's namespace generally is pretty ambiguous already, but the extra characters are extra... not needed to get the job done - 99.9%.  cpp was my mental model and it relies on you to sort that out on your own.

It _could_ either have marker characters or do a better job checking for circular references/ambiguity.

set ab(n) = (select a + b from f where rowid=n);
set aa(n) = (select a + a from f where rowid=n);
set bb(n) = (select b + b from f where rowid=n);
set sm(a,b)=(select sum(aa(a))+sum(bb(b)) from f);

INPUT:
select
     ab(3) as a,
     ab(2) as b,
     aa(7) as c,
     bb(5) as d,
     sm(2,3)   as s
;

OUTPUT (with .spexpand shell flag set to "on" to show expanded prior to query)
-- set ab(n) = (select a + b from f where rowid=n);
-- set aa(n) = (select a + a from f where rowid=n);
-- set bb(n) = (select b + b from f where rowid=n);
-- set sm(a,b)=(select sum(aa(a))+sum(bb(b)) from f);
select
     (select a + b from f where rowid=3) as a,
     (select a + b from f where rowid=2) as b,
     (select a + a from f where rowid=7) as c,
     (select b + b from f where rowid=5) as d,
     (select sum((select 2 + 2 from f where rowid=2))+sum((select 3 + 3 from f where rowid=3)) from f)   as s
;
a   b  c   d   s 
--  -  --  --  --
11  7  26  20  70


INPUT (silly example - but to your "textual?" question:
set fromclause=
from f
where
   b > a
;
select rowid fromclause();

OUTPUT:
1
2
3
4
5
6
7

(13) By anonymous on 2023-03-02 17:47:47 in reply to 12 [link] [source]

"(select sum((select 2 + 2 from f where rowid=2))+sum((select 3 + 3 from f where rowid=3)) from f)   as s"

Showing the dangers of ambiguity... note "a" and "b" are both column names and macro variable names.  Big boy rules apply in my example to avoid this.  I often use _NAME or some such to avoid this... but this was a quick mockup.

(14) By Donal Fellows (dkfellows) on 2023-03-03 16:32:55 in reply to 4 [link] [source]

In general, if using a C language profile that supports them, using static inline functions allows the inlining of code of quite a bit greater complexity than it is normally practical to do with macros. In particular, you have a sane local naming scope, which simplifies some things substantially. On the other hand, you have fixed types and there are things that you simply can't express that way (such as a nice foreach pseudo-language-construct). It's possible to work around the type thing with _Generic but that's getting into the hairier parts of newer versions of the C standard.

If you're sticking to strict C89 compatibility then you don't have static inline in the first place.

(I use static inline extensively in one of my projects. It works very well, as it means that the compiler does good dead code elimination on even just moderate levels of optimisation. you get the same sort of thing with macros, except then you have to do all sorts of futzing around to make sure you don't tread on symbols defined by the caller or other macros; it's possible to do all that, but that level of macro-hackery tends to make folks nauseous.)

(5.1) By Larry Brasfield (larrybr) on 2022-11-14 16:23:05 edited from 5.0 in reply to 3 [link] [source]

why there's little interest in replacing macros with functions?

The developers are well aware of the pitfalls macro usage may present. They are generally wary of coding rules that go beyond style and readability and remove the need for judgment as to effect.

To replace parameterized macro usages with function calls generally, (without regard for case by case circumstances), would exact a runtime cost. The C language does not provide for defining inline functions; parameterized macros are the nearest (and admittedly inferior) substitute.

As for "worth the effort": A great deal of effort goes toward keeping the code clear. Some of that is accomplished with coding conventions such as using upper case for macros that may not act like functions. The reason your suggestion languishes is going to primarily be a lack of agreement that it represents an improvement rather than an assessment related to revision effort. That said, it would be a lot of effort for negligible return, IMHO.

(6.1) By PappasBrent on 2022-11-14 16:41:09 edited from 6.0 in reply to 5.1 [link] [source]

Thank you for the thoughtful response. I have one question about your reply though:

The C language does not provide for defining inline functions

Are you referring to ANSI C? I assume so since that's the standard that SQLite uses, but I just want to be sure.

(7) By Larry Brasfield (larrybr) on 2022-11-14 16:46:02 in reply to 6.0 [link] [source]

The project is maintained such that its final build product can be compiled with C89. As I recall, that is what is most often meant by "ANSI C". This C89 compatibility is intended to make sqlite3 available for a very wide variety of machines.

(8) By PappasBrent on 2022-11-14 16:51:27 in reply to 7 [link] [source]

That's what I assumed, thank you for clarifying.

(15) By anonymous on 2023-11-28 14:05:42 in reply to 6.1 [link] [source]

Recent C standards, eg C99 does have inline functions.

If you want details, contact me (Basile Starynkevitch in France near Paris, at 92340 Bourg-la-Reine) by email to basile@starynkevitch.net

Recent C compilers (including GCC and Clang/LLVM ...) are indeed inlining functions.

Maybe SQLite should be migrated to C99 or better?

Regards.

(16) By Richard Hipp (drh) on 2023-11-28 14:17:51 in reply to 15 [link] [source]

Just in the past release cycle, we tried to use the C99 snprintf() function in the CLI, because MacOS has deprecated sprintf(). This caused grief for many users and so we had to come up with an alternative to using snprintf(). From this recent experience we know that there are still a lot of people out there who depend upon pre-C99 compilers and for various reasons are unable to upgrade.

Your suggestion would mean abandoning those users. And for what benefit?

(9) By cj (sqlitening) on 2022-11-25 14:50:46 in reply to 1.1 [link] [source]

MACRO MONEY(colname)  = "printf('%.2f',"       + colname + "*.01)"
MACRO DATE8(colname)  = "strftime('%m-%d-',"   + colname+ ") || substr(strftime('%Y',"+colname+"),3,2)"
MACRO DATE10(colname) = "strftime('%m-%d-%Y', "+ colname+")"

FUNCTION PBMAIN 'written in PowerBASIC  sql statement pre-processor

 slOpen ":memory:"
 slexe  "create table t1(c1 integer,c2)"
 slexe  "insert into t1 values('123','2022-11-25')
 sql =  "MONEY(c1),DATE8(c2)"
 GetRs  sql

 '1.23   11-25-22

END FUNCTION

SUB GetRS(sql AS STRING)
 DIM sMacro(1 TO 3) AS STRING
 sMacro(1) = $MONEY                                     
 sMacro(2) = $DATE8                                     
 sMacro(3) = $DATE10                                    
 For x = 1 TO 3                                         'for 
  sMacro=sMacro(x)                                       'macro name found in sql
  Do                                                    ' do
   startword = INSTR(sql,sMacro)                        '  Start position
   IF startword = 0 THEN EXIT DO                        '  not found, exit do
   endword = INSTR(startword,sql,")")                   '  end position
   sFunc = MID$(sql,startword,endword-startword+1)      '  sFunc = macro name
   sCol = LEFT$(MID$(sFunc,LEN(sMacro)+1),-1)           '  sCol =  colname
   SELECT CASE sMacro                                   '
    CASE $MONEY :REPLACE sFunc WITH  MONEY(sCol) IN sql '  replace with money(colname)
    CASE $DATE8 :REPLACE sFunc WITH  DATE8(sCol) IN sql '  replace with date8(colname)
    CASE $DATE10:REPLACE sFunc WITH DATE10(sCol) IN sql '  replace with date10(colname)
   END SELECT                                           
  Loop                                                  ' loop
 Next                                                   'next    
END SUB