Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Small size and complexity reduction on the star-query heuristic. Improved comments for the star-query heuristic. |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
a7ecb2f4b7eee78b88f1b2e026dffed2 |
User & Date: | drh 2025-01-26 17:29:33 |
Context
2025-01-26
| ||
20:09 | Further comment improvements in the star-query heuristic. Add an ALWAYS() on an unreachable branch to achieve MC/DC. (check-in: 5e18ce68 user: drh tags: trunk) | |
17:29 | Small size and complexity reduction on the star-query heuristic. Improved comments for the star-query heuristic. (check-in: a7ecb2f4 user: drh tags: trunk) | |
2025-01-25
| ||
23:04 | Revise the strategy used by the star-query heuristic: Instead of decreasing the cost of all fact-table WhereLoops, increase the run-cost of WhereLoops that are SCANs of dimension tables. (check-in: 1bc09c9e user: drh tags: trunk) | |
Changes
Changes to src/sqliteInt.h.
︙ | ︙ | |||
877 878 879 880 881 882 883 884 885 886 887 888 889 890 | ** ** The LogEst can be negative to indicate fractional values. ** Examples: ** ** 0.5 -> -10 0.1 -> -33 0.0625 -> -40 */ typedef INT16_TYPE LogEst; /* ** Set the SQLITE_PTRSIZE macro to the number of bytes in a pointer */ #ifndef SQLITE_PTRSIZE # if defined(__SIZEOF_POINTER__) # define SQLITE_PTRSIZE __SIZEOF_POINTER__ | > > | 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 | ** ** The LogEst can be negative to indicate fractional values. ** Examples: ** ** 0.5 -> -10 0.1 -> -33 0.0625 -> -40 */ typedef INT16_TYPE LogEst; #define LOGEST_MIN (-32768) #define LOGEST_MAX (32767) /* ** Set the SQLITE_PTRSIZE macro to the number of bytes in a pointer */ #ifndef SQLITE_PTRSIZE # if defined(__SIZEOF_POINTER__) # define SQLITE_PTRSIZE __SIZEOF_POINTER__ |
︙ | ︙ |
Changes to src/where.c.
︙ | ︙ | |||
5447 5448 5449 5450 5451 5452 5453 | ** ** 18 for star queries ** 12 otherwise ** ** For the purposes of this heuristic, a star-query is defined as a query ** with a large central table that is joined using an INNER JOIN, ** not CROSS or OUTER JOINs, against four or more smaller tables. | | > | | | > | < > > > > > > > > > > > > > > > > > > | | 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 | ** ** 18 for star queries ** 12 otherwise ** ** For the purposes of this heuristic, a star-query is defined as a query ** with a large central table that is joined using an INNER JOIN, ** not CROSS or OUTER JOINs, against four or more smaller tables. ** The central table is called the "fact" table. The smaller tables ** that get joined are "dimension tables". Also, any table that is ** self-joined cannot be a dimension table; we assume that dimension ** tables may only be joined against fact tables. ** ** SIDE EFFECT: (and really the whole point of this subroutine) ** ** If pWInfo describes a star-query, then the cost for SCANs of dimension ** WhereLoops is increased to be slightly larger than the cost of a SCAN ** in the fact table. Only SCAN costs are increased. SEARCH costs are ** unchanged. This heuristic helps keep fact tables in outer loops. Without ** this heuristic, paths with fact tables in outer loops tend to get pruned ** by the mxChoice limit on the number of paths, resulting in poor query ** plans. See the starschema1.test test module for examples of queries ** that need this heuristic to find good query plans. ** ** This heuristic can be completely disabled, so that no query is ** considered a star-query, using SQLITE_TESTCTRL_OPTIMIZATION to ** disable the SQLITE_StarQuery optimization. In the CLI, the command ** to do that is: ".testctrl opt -starquery". ** ** HISTORICAL NOTES: ** ** This optimization was first added on 2024-05-09 by check-in 38db9b5c83d. ** The original optimization reduced the cost and output size estimate for ** fact tables to help them move to outer loops. But months later (as people ** started upgrading) performance regression reports started caming in, ** including: ** ** forum post b18ef983e68d06d1 (2024-12-21) ** forum post 0025389d0860af82 (2025-01-14) ** forum post d87570a145599033 (2025-01-17) ** ** To address these, the criteria for a star-query was tightened to exclude ** cases where the fact and dimensions are separated by an outer join, and ** the affect of star-schema detection was changed to increase the rRun cost ** on just full table scans of dimension tables, rather than reducing costs ** in the all access methods of the fact table. */ static int computeMxChoice(WhereInfo *pWInfo){ int nLoop = pWInfo->nLevel; /* Number of terms in the join */ WhereLoop *pWLoop; /* For looping over WhereLoops */ #ifdef SQLITE_DEBUG /* The star-query detection code below makes use of the following ** properties of the WhereLoop list, so verify them before ** continuing: ** (1) .maskSelf is the bitmask corresponding to .iTab ** (2) The WhereLoop list is in ascending .iTab order */ for(pWLoop=pWInfo->pLoops; pWLoop; pWLoop=pWLoop->pNextLoop){ assert( pWLoop->maskSelf==MASKBIT(pWLoop->iTab) ); assert( pWLoop->pNextLoop==0 || pWLoop->iTab<=pWLoop->pNextLoop->iTab ); |
︙ | ︙ | |||
5496 5497 5498 5499 5500 5501 5502 | int iFromIdx; /* Term of FROM clause is the candidate fact-table */ Bitmask m; /* Bitmask for candidate fact-table */ Bitmask mSelfJoin = 0; /* Tables that cannot be dimension tables */ WhereLoop *pStart; /* Where to start searching for dimension-tables */ pWInfo->bStarDone = 1; /* Only do this computation once */ | | > | | | 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 | int iFromIdx; /* Term of FROM clause is the candidate fact-table */ Bitmask m; /* Bitmask for candidate fact-table */ Bitmask mSelfJoin = 0; /* Tables that cannot be dimension tables */ WhereLoop *pStart; /* Where to start searching for dimension-tables */ pWInfo->bStarDone = 1; /* Only do this computation once */ /* Check to see if we are dealing with a star schema and if so, adjust ** SCAN cost of dimensino tables so that they are as large as the SCAN ** cost of the fact table. This is a heuristic that helps keep the ** fact tables in outer loops. */ assert( !pWInfo->bStarUsed ); aFromTabs = pWInfo->pTabList->a; pStart = pWInfo->pLoops; for(iFromIdx=0, m=1; iFromIdx<nLoop; iFromIdx++, m<<=1){ int nDep = 0; /* Number of dimension tables */ LogEst mxRun; /* Maximum SCAN cost of a fact table */ |
︙ | ︙ | |||
5539 5540 5541 5542 5543 5544 5545 | }else{ nDep++; mSeen |= pWLoop->maskSelf; } } } if( nDep<=3 ) continue; | | < < < < < < | | > > | | > > > > > > > > > | | > < | | 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 | }else{ nDep++; mSeen |= pWLoop->maskSelf; } } } if( nDep<=3 ) continue; /* If we reach this point, it means that pFactTab is a fact table */ #ifdef WHERETRACE_ENABLED /* Make sure rStarDelta values are initialized */ if( !pWInfo->bStarUsed ){ for(pWLoop=pWInfo->pLoops; pWLoop; pWLoop=pWLoop->pNextLoop){ pWLoop->rStarDelta = 0; } } #endif pWInfo->bStarUsed = 1; /* Compute one more than the maximum cost of any WhereLoop for the ** fact table */ mxRun = LOGEST_MIN; for(pWLoop=pStart; pWLoop; pWLoop=pWLoop->pNextLoop){ if( pWLoop->iTab<iFromIdx ) continue; if( pWLoop->iTab>iFromIdx ) break; if( pWLoop->rRun>mxRun ) mxRun = pWLoop->rRun; } if( ALWAYS(mxRun<LOGEST_MAX) ) mxRun++; /* Increase the cost of table scans for dimension tables to be slightly ** more than the maximum cost of fact table */ for(pWLoop=pStart; pWLoop; pWLoop=pWLoop->pNextLoop){ if( (pWLoop->maskSelf & mSeen)==0 ) continue; if( pWLoop->nLTerm ) continue; if( pWLoop->rRun<mxRun ){ #ifdef WHERETRACE_ENABLED /* 0x80000 */ if( sqlite3WhereTrace & 0x80000 ){ SrcItem *pDim = aFromTabs + pWLoop->iTab; sqlite3DebugPrintf( "Increase SCAN cost of dimension %s(%d) of fact %s(%d) to %d\n", pDim->zAlias ? pDim->zAlias: pDim->pSTab->zName, pWLoop->iTab, pFactTab->zAlias ? pFactTab->zAlias : pFactTab->pSTab->zName, iFromIdx, mxRun ); } pWLoop->rStarDelta = mxRun - pWLoop->rRun; #endif /* WHERETRACE_ENABLED */ pWLoop->rRun = mxRun; } } } #ifdef WHERETRACE_ENABLED /* 0x80000 */ if( (sqlite3WhereTrace & 0x80000)!=0 && pWInfo->bStarUsed ){ sqlite3DebugPrintf("WhereLoops changed by star-query heuristic:\n"); for(pWLoop=pWInfo->pLoops; pWLoop; pWLoop=pWLoop->pNextLoop){ |
︙ | ︙ |
Changes to src/whereInt.h.
︙ | ︙ | |||
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | } u; u32 wsFlags; /* WHERE_* flags describing the plan */ u16 nLTerm; /* Number of entries in aLTerm[] */ u16 nSkip; /* Number of NULL aLTerm[] entries */ /**** whereLoopXfer() copies fields above ***********************/ # define WHERE_LOOP_XFER_SZ offsetof(WhereLoop,nLSlot) u16 nLSlot; /* Number of slots allocated for aLTerm[] */ LogEst rStarDelta; /* Cost delta due to star-schema heuristic. Not ** initialized unless pWInfo->bStarUsed */ WhereTerm **aLTerm; /* WhereTerms used */ WhereLoop *pNextLoop; /* Next WhereLoop object in the WhereClause */ WhereTerm *aLTermSpace[3]; /* Initial aLTerm[] space */ }; /* This object holds the prerequisites and the cost of running a ** subquery on one operand of an OR operator in the WHERE clause. | > > | 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | } u; u32 wsFlags; /* WHERE_* flags describing the plan */ u16 nLTerm; /* Number of entries in aLTerm[] */ u16 nSkip; /* Number of NULL aLTerm[] entries */ /**** whereLoopXfer() copies fields above ***********************/ # define WHERE_LOOP_XFER_SZ offsetof(WhereLoop,nLSlot) u16 nLSlot; /* Number of slots allocated for aLTerm[] */ #ifdef WHERETRACE_ENABLED LogEst rStarDelta; /* Cost delta due to star-schema heuristic. Not ** initialized unless pWInfo->bStarUsed */ #endif WhereTerm **aLTerm; /* WhereTerms used */ WhereLoop *pNextLoop; /* Next WhereLoop object in the WhereClause */ WhereTerm *aLTermSpace[3]; /* Initial aLTerm[] space */ }; /* This object holds the prerequisites and the cost of running a ** subquery on one operand of an OR operator in the WHERE clause. |
︙ | ︙ | |||
481 482 483 484 485 486 487 | u8 nLevel; /* Number of nested loop */ i8 nOBSat; /* Number of ORDER BY terms satisfied by indices */ u8 eOnePass; /* ONEPASS_OFF, or _SINGLE, or _MULTI */ u8 eDistinct; /* One of the WHERE_DISTINCT_* values */ unsigned bDeferredSeek :1; /* Uses OP_DeferredSeek */ unsigned untestedTerms :1; /* Not all WHERE terms resolved by outer loop */ unsigned bOrderedInnerLoop:1;/* True if only the inner-most loop is ordered */ | | | | 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 | u8 nLevel; /* Number of nested loop */ i8 nOBSat; /* Number of ORDER BY terms satisfied by indices */ u8 eOnePass; /* ONEPASS_OFF, or _SINGLE, or _MULTI */ u8 eDistinct; /* One of the WHERE_DISTINCT_* values */ unsigned bDeferredSeek :1; /* Uses OP_DeferredSeek */ unsigned untestedTerms :1; /* Not all WHERE terms resolved by outer loop */ unsigned bOrderedInnerLoop:1;/* True if only the inner-most loop is ordered */ unsigned sorted :1; /* True if really sorted (not just grouped) */ unsigned bStarDone :1; /* True if check for star-query is complete */ unsigned bStarUsed :1; /* True if star-query heuristic is used */ LogEst nRowOut; /* Estimated number of output rows */ #ifdef WHERETRACE_ENABLED LogEst rTotalCost; /* Total cost of the solution */ #endif int iTop; /* The very beginning of the WHERE loop */ int iEndWhere; /* End of the WHERE clause itself */ WhereLoop *pLoops; /* List of all WhereLoop objects */ |
︙ | ︙ |
Changes to test/starschema1.test.
1 2 3 4 5 6 7 8 9 10 11 12 | # 2024-05-28 # # The author disclaims copyright to this source code. In place of # a legal notice, here is a blessing: # # May you do good and not evil. # May you find forgiveness for yourself and forgive others. # May you share freely, never taking more than you give. # #*********************************************************************** # # Test cases for the ability of the query planner to cope with | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # 2024-05-28 # # The author disclaims copyright to this source code. In place of # a legal notice, here is a blessing: # # May you do good and not evil. # May you find forgiveness for yourself and forgive others. # May you share freely, never taking more than you give. # #*********************************************************************** # # Test cases for the ability of the query planner to cope with # star-schema queries. # set testdir [file dirname $argv0] source $testdir/tester.tcl set ::testprefix starschema1 do_execsql_test 1.1 { |
︙ | ︙ |