SQLite

Check-in [a7ecb2f4]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Small size and complexity reduction on the star-query heuristic. Improved comments for the star-query heuristic.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: a7ecb2f4b7eee78b88f1b2e026dffed2007ca4ffeb152632624ab2582839b250
User & Date: drh 2025-01-26 17:29:33
Context
2025-01-26
20:09
Further comment improvements in the star-query heuristic. Add an ALWAYS() on an unreachable branch to achieve MC/DC. (check-in: 5e18ce68 user: drh tags: trunk)
17:29
Small size and complexity reduction on the star-query heuristic. Improved comments for the star-query heuristic. (check-in: a7ecb2f4 user: drh tags: trunk)
2025-01-25
23:04
Revise the strategy used by the star-query heuristic: Instead of decreasing the cost of all fact-table WhereLoops, increase the run-cost of WhereLoops that are SCANs of dimension tables. (check-in: 1bc09c9e user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to src/sqliteInt.h.
877
878
879
880
881
882
883


884
885
886
887
888
889
890
**
** The LogEst can be negative to indicate fractional values.
** Examples:
**
**    0.5 -> -10           0.1 -> -33        0.0625 -> -40
*/
typedef INT16_TYPE LogEst;



/*
** Set the SQLITE_PTRSIZE macro to the number of bytes in a pointer
*/
#ifndef SQLITE_PTRSIZE
# if defined(__SIZEOF_POINTER__)
#   define SQLITE_PTRSIZE __SIZEOF_POINTER__







>
>







877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
**
** The LogEst can be negative to indicate fractional values.
** Examples:
**
**    0.5 -> -10           0.1 -> -33        0.0625 -> -40
*/
typedef INT16_TYPE LogEst;
#define LOGEST_MIN (-32768)
#define LOGEST_MAX (32767)

/*
** Set the SQLITE_PTRSIZE macro to the number of bytes in a pointer
*/
#ifndef SQLITE_PTRSIZE
# if defined(__SIZEOF_POINTER__)
#   define SQLITE_PTRSIZE __SIZEOF_POINTER__
Changes to src/where.c.
5447
5448
5449
5450
5451
5452
5453
5454
5455
5456
5457
5458
5459
5460
5461
5462

5463
5464
5465

5466
5467
5468
5469
5470
5471
5472


















5473
5474
5475
5476
5477
5478
5479
5480
5481
5482
5483
5484
5485
5486
5487
**
**     18    for star queries
**     12    otherwise
**
** For the purposes of this heuristic, a star-query is defined as a query
** with a large central table that is joined using an INNER JOIN,
** not CROSS or OUTER JOINs, against four or more smaller tables.
*  The central table is called the "fact" table.  The smaller tables
** that get joined are "dimension tables".  Also, any table that is
** self-joined cannot be a dimension table; we assume that dimension
** tables may only be joined against fact tables.
**
** SIDE EFFECT:  (and really the whole point of this subroutine)
**
** If pWInfo describes a star-query, then the cost for SCANs of dimension
** WhereLoops is increased to be slightly larger than the cost of a SCAN

** in the fact table.  This heuristic helps keep fact tables in
** outer loops.  Without this heuristic, paths with fact tables in outer
** loops tend to get pruned by the mxChoice limit on the number of paths,

** resulting in poor query plans.  The cost adjustment for each WhereLoop
** is stored in its rStarDelta field.
**
** This heuristic can be completely disabled, so that no query is
** considered a star-query, using SQLITE_TESTCTRL_OPTIMIZATION to
** disable the SQLITE_StarQuery optimization.  In the CLI, the command
** to do that is:  ".testctrl opt -starquery".


















*/
static int computeMxChoice(WhereInfo *pWInfo){
  int nLoop = pWInfo->nLevel;    /* Number of terms in the join */
  WhereLoop *pWLoop;             /* For looping over WhereLoops */

#ifdef SQLITE_DEBUG
  /* The star-query detection code below makes use of the following
  ** properties of the WhereLoop list, so verifying them before
  ** continuing:
  **    (1)  .maskSelf is the bitmask corresponding to .iTab
  **    (2)  The WhereLoop list is in ascending .iTab order
  */
  for(pWLoop=pWInfo->pLoops; pWLoop; pWLoop=pWLoop->pNextLoop){
    assert( pWLoop->maskSelf==MASKBIT(pWLoop->iTab) );
    assert( pWLoop->pNextLoop==0 || pWLoop->iTab<=pWLoop->pNextLoop->iTab );







|








>
|
|
|
>
|
<





>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







|







5447
5448
5449
5450
5451
5452
5453
5454
5455
5456
5457
5458
5459
5460
5461
5462
5463
5464
5465
5466
5467
5468

5469
5470
5471
5472
5473
5474
5475
5476
5477
5478
5479
5480
5481
5482
5483
5484
5485
5486
5487
5488
5489
5490
5491
5492
5493
5494
5495
5496
5497
5498
5499
5500
5501
5502
5503
5504
5505
5506
**
**     18    for star queries
**     12    otherwise
**
** For the purposes of this heuristic, a star-query is defined as a query
** with a large central table that is joined using an INNER JOIN,
** not CROSS or OUTER JOINs, against four or more smaller tables.
** The central table is called the "fact" table.  The smaller tables
** that get joined are "dimension tables".  Also, any table that is
** self-joined cannot be a dimension table; we assume that dimension
** tables may only be joined against fact tables.
**
** SIDE EFFECT:  (and really the whole point of this subroutine)
**
** If pWInfo describes a star-query, then the cost for SCANs of dimension
** WhereLoops is increased to be slightly larger than the cost of a SCAN
** in the fact table.  Only SCAN costs are increased.  SEARCH costs are
** unchanged. This heuristic helps keep fact tables in outer loops. Without
** this heuristic, paths with fact tables in outer loops tend to get pruned
** by the mxChoice limit on the number of paths, resulting in poor query
** plans.  See the starschema1.test test module for examples of queries
** that need this heuristic to find good query plans.

**
** This heuristic can be completely disabled, so that no query is
** considered a star-query, using SQLITE_TESTCTRL_OPTIMIZATION to
** disable the SQLITE_StarQuery optimization.  In the CLI, the command
** to do that is:  ".testctrl opt -starquery".
**
** HISTORICAL NOTES:
**
** This optimization was first added on 2024-05-09 by check-in 38db9b5c83d.
** The original optimization reduced the cost and output size estimate for
** fact tables to help them move to outer loops.  But months later (as people
** started upgrading) performance regression reports started caming in,
** including:
**
**    forum post b18ef983e68d06d1 (2024-12-21)
**    forum post 0025389d0860af82 (2025-01-14)
**    forum post d87570a145599033 (2025-01-17)
**
** To address these, the criteria for a star-query was tightened to exclude
** cases where the fact and dimensions are separated by an outer join, and
** the affect of star-schema detection was changed to increase the rRun cost
** on just full table scans of dimension tables, rather than reducing costs
** in the all access methods of the fact table.
*/
static int computeMxChoice(WhereInfo *pWInfo){
  int nLoop = pWInfo->nLevel;    /* Number of terms in the join */
  WhereLoop *pWLoop;             /* For looping over WhereLoops */

#ifdef SQLITE_DEBUG
  /* The star-query detection code below makes use of the following
  ** properties of the WhereLoop list, so verify them before
  ** continuing:
  **    (1)  .maskSelf is the bitmask corresponding to .iTab
  **    (2)  The WhereLoop list is in ascending .iTab order
  */
  for(pWLoop=pWInfo->pLoops; pWLoop; pWLoop=pWLoop->pNextLoop){
    assert( pWLoop->maskSelf==MASKBIT(pWLoop->iTab) );
    assert( pWLoop->pNextLoop==0 || pWLoop->iTab<=pWLoop->pNextLoop->iTab );
5496
5497
5498
5499
5500
5501
5502
5503

5504
5505
5506
5507
5508
5509
5510
5511
5512
    int iFromIdx;          /* Term of FROM clause is the candidate fact-table */
    Bitmask m;             /* Bitmask for candidate fact-table */
    Bitmask mSelfJoin = 0; /* Tables that cannot be dimension tables */       
    WhereLoop *pStart;     /* Where to start searching for dimension-tables */

    pWInfo->bStarDone = 1; /* Only do this computation once */

    /* Check to see if we are dealing with a star schema and if so, reduce

    ** the cost of fact tables relative to dimension tables, as a heuristic
    ** to help keep the fact tables in outer loops.
    */
    assert( !pWInfo->bStarUsed );
    aFromTabs = pWInfo->pTabList->a;
    pStart = pWInfo->pLoops;
    for(iFromIdx=0, m=1; iFromIdx<nLoop; iFromIdx++, m<<=1){
      int nDep = 0;             /* Number of dimension tables */
      LogEst mxRun;             /* Maximum SCAN cost of a fact table */







|
>
|
|







5515
5516
5517
5518
5519
5520
5521
5522
5523
5524
5525
5526
5527
5528
5529
5530
5531
5532
    int iFromIdx;          /* Term of FROM clause is the candidate fact-table */
    Bitmask m;             /* Bitmask for candidate fact-table */
    Bitmask mSelfJoin = 0; /* Tables that cannot be dimension tables */       
    WhereLoop *pStart;     /* Where to start searching for dimension-tables */

    pWInfo->bStarDone = 1; /* Only do this computation once */

    /* Check to see if we are dealing with a star schema and if so, adjust
    ** SCAN cost of dimensino tables so that they are as large as the SCAN
    ** cost of the fact table.  This is a heuristic that helps keep the
    ** fact tables in outer loops.
    */
    assert( !pWInfo->bStarUsed );
    aFromTabs = pWInfo->pTabList->a;
    pStart = pWInfo->pLoops;
    for(iFromIdx=0, m=1; iFromIdx<nLoop; iFromIdx++, m<<=1){
      int nDep = 0;             /* Number of dimension tables */
      LogEst mxRun;             /* Maximum SCAN cost of a fact table */
5539
5540
5541
5542
5543
5544
5545
5546
5547
5548
5549
5550
5551
5552
5553
5554
5555
5556
5557
5558
5559


5560
5561









5562
5563
5564
5565
5566
5567
5568
5569
5570
5571
5572
5573
5574
5575
5576
5577
5578

5579
5580
5581
5582
5583
5584
5585
5586
5587
5588
          }else{
            nDep++;
            mSeen |= pWLoop->maskSelf;
          }
        }
      }
      if( nDep<=3 ) continue;

      /* Compute the maximum cost of any WhereLoop for the iFromIdx-th term */
      mxRun = -32767;
      for(pWLoop=pStart; pWLoop; pWLoop=pWLoop->pNextLoop){
        if( pWLoop->iTab<iFromIdx ) continue;
        if( pWLoop->iTab>iFromIdx ) break;
        if( pWLoop->rRun>mxRun ) mxRun = pWLoop->rRun;
      }

      /* Make sure rStarDelta values are initialized */
      if( !pWInfo->bStarUsed ){
        for(pWLoop=pWInfo->pLoops; pWLoop; pWLoop=pWLoop->pNextLoop){
          pWLoop->rStarDelta = 0;
        }


        pWInfo->bStarUsed = 1;
      }










      /* Increase the cost of table scans for dimension tables to be slightly
      ** more than the maximum cost of fact table */
      for(pWLoop=pStart; pWLoop; pWLoop=pWLoop->pNextLoop){
        if( (pWLoop->maskSelf & mSeen)==0 ) continue;
        if( pWLoop->nLTerm ) continue;
        if( pWLoop->rRun<=mxRun ){
#ifdef WHERETRACE_ENABLED /* 0x80000 */
          if( sqlite3WhereTrace & 0x80000 ){
            SrcItem *pDim = aFromTabs + pWLoop->iTab;
            sqlite3DebugPrintf(
              "Increase SCAN cost of dimension %s(%d) of fact %s(%d) to %d\n",
              pDim->zAlias ? pDim->zAlias: pDim->pSTab->zName, pWLoop->iTab,
              pFactTab->zAlias ? pFactTab->zAlias : pFactTab->pSTab->zName,
              iFromIdx, mxRun+1
            );
          }

#endif /* WHERETRACE_ENABLED */
          pWLoop->rStarDelta = mxRun+1 - pWLoop->rRun;
          pWLoop->rRun = mxRun+1;
        }
      }
    }
#ifdef WHERETRACE_ENABLED /* 0x80000 */
    if( (sqlite3WhereTrace & 0x80000)!=0 && pWInfo->bStarUsed ){
      sqlite3DebugPrintf("WhereLoops changed by star-query heuristic:\n");
      for(pWLoop=pWInfo->pLoops; pWLoop; pWLoop=pWLoop->pNextLoop){







|
<
<
<
<
<
<
|
|





>
>
|
|
>
>
>
>
>
>
>
>
>






|







|


>

<
|







5559
5560
5561
5562
5563
5564
5565
5566






5567
5568
5569
5570
5571
5572
5573
5574
5575
5576
5577
5578
5579
5580
5581
5582
5583
5584
5585
5586
5587
5588
5589
5590
5591
5592
5593
5594
5595
5596
5597
5598
5599
5600
5601
5602
5603
5604
5605

5606
5607
5608
5609
5610
5611
5612
5613
          }else{
            nDep++;
            mSeen |= pWLoop->maskSelf;
          }
        }
      }
      if( nDep<=3 ) continue;
      /* If we reach this point, it means that pFactTab is a fact table */






     
#ifdef WHERETRACE_ENABLED
      /* Make sure rStarDelta values are initialized */
      if( !pWInfo->bStarUsed ){
        for(pWLoop=pWInfo->pLoops; pWLoop; pWLoop=pWLoop->pNextLoop){
          pWLoop->rStarDelta = 0;
        }
      }
#endif
      pWInfo->bStarUsed = 1;

      /* Compute one more than the maximum cost of any WhereLoop for the
      ** fact table */
      mxRun = LOGEST_MIN;
      for(pWLoop=pStart; pWLoop; pWLoop=pWLoop->pNextLoop){
        if( pWLoop->iTab<iFromIdx ) continue;
        if( pWLoop->iTab>iFromIdx ) break;
        if( pWLoop->rRun>mxRun ) mxRun = pWLoop->rRun;
      }
      if( ALWAYS(mxRun<LOGEST_MAX) ) mxRun++;

      /* Increase the cost of table scans for dimension tables to be slightly
      ** more than the maximum cost of fact table */
      for(pWLoop=pStart; pWLoop; pWLoop=pWLoop->pNextLoop){
        if( (pWLoop->maskSelf & mSeen)==0 ) continue;
        if( pWLoop->nLTerm ) continue;
        if( pWLoop->rRun<mxRun ){
#ifdef WHERETRACE_ENABLED /* 0x80000 */
          if( sqlite3WhereTrace & 0x80000 ){
            SrcItem *pDim = aFromTabs + pWLoop->iTab;
            sqlite3DebugPrintf(
              "Increase SCAN cost of dimension %s(%d) of fact %s(%d) to %d\n",
              pDim->zAlias ? pDim->zAlias: pDim->pSTab->zName, pWLoop->iTab,
              pFactTab->zAlias ? pFactTab->zAlias : pFactTab->pSTab->zName,
              iFromIdx, mxRun
            );
          }
          pWLoop->rStarDelta = mxRun - pWLoop->rRun;
#endif /* WHERETRACE_ENABLED */

          pWLoop->rRun = mxRun;
        }
      }
    }
#ifdef WHERETRACE_ENABLED /* 0x80000 */
    if( (sqlite3WhereTrace & 0x80000)!=0 && pWInfo->bStarUsed ){
      sqlite3DebugPrintf("WhereLoops changed by star-query heuristic:\n");
      for(pWLoop=pWInfo->pLoops; pWLoop; pWLoop=pWLoop->pNextLoop){
Changes to src/whereInt.h.
158
159
160
161
162
163
164

165
166

167
168
169
170
171
172
173
  } u;
  u32 wsFlags;          /* WHERE_* flags describing the plan */
  u16 nLTerm;           /* Number of entries in aLTerm[] */
  u16 nSkip;            /* Number of NULL aLTerm[] entries */
  /**** whereLoopXfer() copies fields above ***********************/
# define WHERE_LOOP_XFER_SZ offsetof(WhereLoop,nLSlot)
  u16 nLSlot;           /* Number of slots allocated for aLTerm[] */

  LogEst rStarDelta;    /* Cost delta due to star-schema heuristic.  Not
                        ** initialized unless pWInfo->bStarUsed */

  WhereTerm **aLTerm;   /* WhereTerms used */
  WhereLoop *pNextLoop; /* Next WhereLoop object in the WhereClause */
  WhereTerm *aLTermSpace[3];  /* Initial aLTerm[] space */
};

/* This object holds the prerequisites and the cost of running a
** subquery on one operand of an OR operator in the WHERE clause.







>


>







158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
  } u;
  u32 wsFlags;          /* WHERE_* flags describing the plan */
  u16 nLTerm;           /* Number of entries in aLTerm[] */
  u16 nSkip;            /* Number of NULL aLTerm[] entries */
  /**** whereLoopXfer() copies fields above ***********************/
# define WHERE_LOOP_XFER_SZ offsetof(WhereLoop,nLSlot)
  u16 nLSlot;           /* Number of slots allocated for aLTerm[] */
#ifdef WHERETRACE_ENABLED
  LogEst rStarDelta;    /* Cost delta due to star-schema heuristic.  Not
                        ** initialized unless pWInfo->bStarUsed */
#endif
  WhereTerm **aLTerm;   /* WhereTerms used */
  WhereLoop *pNextLoop; /* Next WhereLoop object in the WhereClause */
  WhereTerm *aLTermSpace[3];  /* Initial aLTerm[] space */
};

/* This object holds the prerequisites and the cost of running a
** subquery on one operand of an OR operator in the WHERE clause.
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
  u8 nLevel;                /* Number of nested loop */
  i8 nOBSat;                /* Number of ORDER BY terms satisfied by indices */
  u8 eOnePass;              /* ONEPASS_OFF, or _SINGLE, or _MULTI */
  u8 eDistinct;             /* One of the WHERE_DISTINCT_* values */
  unsigned bDeferredSeek :1;   /* Uses OP_DeferredSeek */
  unsigned untestedTerms :1;   /* Not all WHERE terms resolved by outer loop */
  unsigned bOrderedInnerLoop:1;/* True if only the inner-most loop is ordered */
  unsigned sorted :1;          /* True if really sorted (not just grouped) */
  unsigned bStarDone     :1;   /* True if check for star-query is complete */
  unsigned bStarUsed     :1;   /* True if cost adjustments for star-query */
  LogEst nRowOut;           /* Estimated number of output rows */
#ifdef WHERETRACE_ENABLED
  LogEst rTotalCost;        /* Total cost of the solution */
#endif
  int iTop;                 /* The very beginning of the WHERE loop */
  int iEndWhere;            /* End of the WHERE clause itself */
  WhereLoop *pLoops;        /* List of all WhereLoop objects */







|

|







483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
  u8 nLevel;                /* Number of nested loop */
  i8 nOBSat;                /* Number of ORDER BY terms satisfied by indices */
  u8 eOnePass;              /* ONEPASS_OFF, or _SINGLE, or _MULTI */
  u8 eDistinct;             /* One of the WHERE_DISTINCT_* values */
  unsigned bDeferredSeek :1;   /* Uses OP_DeferredSeek */
  unsigned untestedTerms :1;   /* Not all WHERE terms resolved by outer loop */
  unsigned bOrderedInnerLoop:1;/* True if only the inner-most loop is ordered */
  unsigned sorted        :1;   /* True if really sorted (not just grouped) */
  unsigned bStarDone     :1;   /* True if check for star-query is complete */
  unsigned bStarUsed     :1;   /* True if star-query heuristic is used */
  LogEst nRowOut;           /* Estimated number of output rows */
#ifdef WHERETRACE_ENABLED
  LogEst rTotalCost;        /* Total cost of the solution */
#endif
  int iTop;                 /* The very beginning of the WHERE loop */
  int iEndWhere;            /* End of the WHERE clause itself */
  WhereLoop *pLoops;        /* List of all WhereLoop objects */
Changes to test/starschema1.test.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 2024-05-28
#
# The author disclaims copyright to this source code.  In place of
# a legal notice, here is a blessing:
#
#    May you do good and not evil.
#    May you find forgiveness for yourself and forgive others.
#    May you share freely, never taking more than you give.
#
#***********************************************************************
# 
# Test cases for the ability of the query planner to cope with
# star-schema queries on databases with goofy indexes.
#

set testdir [file dirname $argv0]
source $testdir/tester.tcl
set ::testprefix starschema1

do_execsql_test 1.1 {












|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 2024-05-28
#
# The author disclaims copyright to this source code.  In place of
# a legal notice, here is a blessing:
#
#    May you do good and not evil.
#    May you find forgiveness for yourself and forgive others.
#    May you share freely, never taking more than you give.
#
#***********************************************************************
# 
# Test cases for the ability of the query planner to cope with
# star-schema queries.
#

set testdir [file dirname $argv0]
source $testdir/tester.tcl
set ::testprefix starschema1

do_execsql_test 1.1 {