ORDER BY not ordering with the correct result

(1) By jose isaias cabrera (jicman) on 2021-07-17 20:39:11 [source]

Greetings.

I have this SQL code,

WITH ProjIDs AS (VALUES 
   ('PR0000016675'),
   ('PR0000017285'),
   ('PR0000017416'),
   ('PR0000019109'),
   ('PR0000019129'),
   ('PR0000019130'),
   ('PR0000019131'),
   ('PR0000019133'),
   ('PR0000019135'),
   ('PR0000019136'),
   ('PR0000019139'),
   ('PR0000019140'),
   ('PR0000019141'),
   ('PR0000019142'),
   ('PR0000019143'),
   ('PR0000019147'),
   ('PR0000019148'),
   ('PR0000019150'),
   ('PR0000019152'),
   ('PR0000019164'),
   ('PR0000019176'),
   ('PR0000019180'),
   ('PR0000019186'),
   ('PR0000019191'),
   ('PR0000019764')
)
SELECT a.*,total(b.AnnualDossier) as Dossier FROM Project_List AS a 
LEFT JOIN ABT_Budget AS b ON a.ProjID = b.ProjID 
AND b.InsertDate = 
  ( 
     SELECT MAX(InsertDate) FROM ABT_Budget WHERE b.ProjID IN ProjIDs 
     OR 
     b.InsertDate IS NULL 
  ) 
WHERE a.ProjID IN ProjIDs 
AND a.InsertDate = 
( 
   SELECT MAX(InsertDate) FROM Project_List WHERE ProjID = a.ProjID  
) 

  GROUP BY a.ProjID
  ORDER BY a.Project_Manager;

But I am not getting a sorted by Project_Manager. What am I doing wrong? Thanks for the support.

josé

(2) By Keith Medcalf (kmedcalf) on 2021-07-17 21:38:12 in reply to 1 [link] [source]

So this is your query (reformatted for eye-bleeding prevention):

WITH ProjIDs AS
     (
      VALUES ('PR0000016675'),
             ('PR0000017285'),
             ('PR0000017416'),
             ('PR0000019109'),
             ('PR0000019129'),
             ('PR0000019130'),
             ('PR0000019131'),
             ('PR0000019133'),
             ('PR0000019135'),
             ('PR0000019136'),
             ('PR0000019139'),
             ('PR0000019140'),
             ('PR0000019141'),
             ('PR0000019142'),
             ('PR0000019143'),
             ('PR0000019147'),
             ('PR0000019148'),
             ('PR0000019150'),
             ('PR0000019152'),
             ('PR0000019164'),
             ('PR0000019176'),
             ('PR0000019180'),
             ('PR0000019186'),
             ('PR0000019191'),
             ('PR0000019764')
     )
   SELECT a.*,
          total(b.AnnualDossier) as Dossier
     FROM Project_List AS a
LEFT JOIN ABT_Budget AS b
       ON a.ProjID = b.ProjID
      AND b.InsertDate = (
                          SELECT MAX(InsertDate)
                            FROM ABT_Budget
                           WHERE b.ProjID IN ProjIDs
                              OR b.InsertDate IS NULL
                         )
    WHERE a.ProjID IN ProjIDs
      AND a.InsertDate = (
                          SELECT MAX(InsertDate)
                            FROM Project_List
                           WHERE ProjID = a.ProjID
                         )
  GROUP BY a.ProjID
  ORDER BY a.Project_Manager
;

I don't see a problem except for this strange construction that I cannot decode:

      AND b.InsertDate = (
                          SELECT MAX(InsertDate)
                            FROM ABT_Budget
                           WHERE b.ProjID IN ProjIDs
                              OR b.InsertDate IS NULL
                         )

which means that the descent row is chosen if the insertdate is the max(insertdate) of all the projects being reported (theoretically and convolutedly). In particular this:

b.InsertDate = (b.InsertDate is NULL)

which will ALWAYS without exception evaluate to FALSE, so serves no purpose.

What is the "intention" of this code as it seems somewhat confusing?

We already know that b.ProjID must be in ProjIDs (it is a selection candidate for the table a (Project_List)). If you simply want the max(InsertDate) for all ProjIDs in ABT_Budget, there is no reason for this to be a correlated query. You only need to execute it once because the result will not change. Get rid of the OR and get rid of the b.ProjID (it should just be ProjID).

Of course, if what you mean is that you want b.InsertDate to be the max(InsertDate) for that project (as in correlated) then it should be b.InsertDate = (select max(insertdate) from abt_budget where ProjId = b.ProjID).

(3) By jose isaias cabrera (jicman) on 2021-07-17 22:43:41 in reply to 2 [link] [source]

So this is your query (reformatted for eye-bleeding prevention):

Thanks for the rearranging. Much cleaner and easier to the eyes. :-)

[clip]

b.InsertDate = (b.InsertDate is NULL)

What is the "intention" of this code as it seems somewhat confusing?

This is because if there are no result from total(b.AnnualDossier), in other words, the ProjID does not exists in the table b (ABT_Budget), then that record won't show in the result. So, with this assignment, the result has all the records listed in the WITH part of the code.

Of course, if what you mean is that you want b.InsertDate to be the max(InsertDate) for that project (as in correlated) then it should be b.InsertDate = (select max(insertdate) from abt_budget where ProjId = b.ProjID).

Yes, you're right. Thanks. I found out what it was not sorting (spaces in front the data). But, this is also much faster by using your suggested changes.

(4) By Keith Medcalf (kmedcalf) on 2021-07-17 22:52:59 in reply to 3 [link] [source]

This may not be really efficient, but I assume this is really what you are trying to do:

WITH ProjIDs AS
     (
      VALUES ('PR0000016675'),
             ('PR0000017285'),
             ('PR0000017416'),
             ('PR0000019109'),
             ('PR0000019129'),
             ('PR0000019130'),
             ('PR0000019131'),
             ('PR0000019133'),
             ('PR0000019135'),
             ('PR0000019136'),
             ('PR0000019139'),
             ('PR0000019140'),
             ('PR0000019141'),
             ('PR0000019142'),
             ('PR0000019143'),
             ('PR0000019147'),
             ('PR0000019148'),
             ('PR0000019150'),
             ('PR0000019152'),
             ('PR0000019164'),
             ('PR0000019176'),
             ('PR0000019180'),
             ('PR0000019186'),
             ('PR0000019191'),
             ('PR0000019764')
     ),
     Proj_List AS
     (
      select *
        from Project_List as o
       where ProjID in ProjIDs
         and InsertDate = (
                           select max(InsertDate)
                             from Project_List
                            where ProjID = o.ProjID
                          )
     ),
     Proj_Budget AS
     (
      select *
        from ABT_Budget as o
       where ProjID in ProjIDs
         and InsertDate = (
                           select max(InsertDate)
                             from ABT_Budget
                            where ProjID = o.ProjID
                          )
     ),
   select Proj_List.*,
          sum(AnnualDossier) as Dossier
     from Proj_List as a
left join Proj_Budget as b
       on a.ProjID == b.ProjID
 group by a.ProjID
 order by a.Project_Manager
;

That is, given a list of Project IDs, find all the Proj_List such that the ProjID is in the list and the InsertDate for that Project Record is the maximum InsertDate for all records bearing that Project ID

Mutatis mutandis for Proj_Budget.

The with the Proj_List in the outer loop, find and sum all the Proj_Budget.AnnualDossier for that project ...

And produce the output in order by Project_Manager. (with multiple projects overseen by the same manager appearing in random order).

(5) By jose isaias cabrera (jicman) on 2021-07-17 23:30:08 in reply to 4 [link] [source]

Wow! How do you come up with these SQL combinations? Someday I will be like you. :-)

I did made these changes to the final select to make it work,

     )                                       -- took the comma out
   select a.*,                               -- replace Proj_List with a
          sum(b.AnnualDossier) as Dossier    -- added a b. in the sum function
     from Proj_List as a
left join Proj_Budget as b
       on a.ProjID == b.ProjID
 group by a.ProjID
 order by a.Project_Manager
;

and it worked. But it looks like they both have the same speed. This is the expert result for the last one you provided:


(no new indexes)

MATERIALIZE Proj_Budget
SEARCH o USING COVERING INDEX sqlite_autoindex_ABT_Budget_1 (ProjID=?)
LIST SUBQUERY 29
MATERIALIZE ProjIDs
SCAN 25 CONSTANT ROWS
SCAN ProjIDs
CORRELATED SCALAR SUBQUERY 30
SEARCH ABT_Budget USING COVERING INDEX ABT_Budget_idx_9bd20c76 (ProjID=?)
SEARCH o USING INDEX PL_ProjID (ProjID=?)
LIST SUBQUERY 26
SCAN ProjIDs
CORRELATED SCALAR SUBQUERY 27
SEARCH Project_List USING COVERING INDEX Project_List_idx_292415b1 (ProjID=?)
SEARCH b USING AUTOMATIC COVERING INDEX (ProjID=?)
USE TEMP B-TREE FOR ORDER BY

and this is for the previous one with your suggested fix:

(no new indexes)

SEARCH a USING INDEX PL_ProjID (ProjID=?)
LIST SUBQUERY 27
CO-ROUTINE ProjIDs
SCAN 25 CONSTANT ROWS
SCAN ProjIDs
CORRELATED SCALAR SUBQUERY 28
SEARCH Project_List USING COVERING INDEX Project_List_idx_292415b1 (ProjID=?)
SEARCH b USING COVERING INDEX PAB_ProjID_AnnualDossier_InsertDate (ProjID=?)
CORRELATED SCALAR SUBQUERY 26
SEARCH ABT_Budget USING COVERING INDEX ABT_Budget_idx_9bd20c76 (ProjID=?)
USE TEMP B-TREE FOR ORDER BY

Don't really understand this well, but which one is better? :-)

(6.1) By Keith Medcalf (kmedcalf) on 2021-07-17 23:41:59 edited from 6.0 in reply to 5 [link] [source]

I think they are probably about the same so I would generally prefer the latter one because it makes more clear exactly what you are doing by defining the things you need to define (the ProjID list), and then generating the "Proj_List" and "Proj_Budget" minimal set of rows you are interested in, then using a pretty simple "final select / projection" to get what you want to see.

That form is easiest to understand and leaves all the work to the query planner to figure out how best to compute what you asked for.

WITH ProjIDs AS
     (
      VALUES ('PR0000016675'),
             ('PR0000017285'),
             ('PR0000017416'),
             ('PR0000019109'),
             ('PR0000019129'),
             ('PR0000019130'),
             ('PR0000019131'),
             ('PR0000019133'),
             ('PR0000019135'),
             ('PR0000019136'),
             ('PR0000019139'),
             ('PR0000019140'),
             ('PR0000019141'),
             ('PR0000019142'),
             ('PR0000019143'),
             ('PR0000019147'),
             ('PR0000019148'),
             ('PR0000019150'),
             ('PR0000019152'),
             ('PR0000019164'),
             ('PR0000019176'),
             ('PR0000019180'),
             ('PR0000019186'),
             ('PR0000019191'),
             ('PR0000019764')
     ),
     Proj_List AS
     (
      select *
        from Project_List as o
       where ProjID in ProjIDs
         and InsertDate = (
                           select max(InsertDate)
                             from Project_List
                            where ProjID = o.ProjID
                          )
     ),
     Proj_Budget AS
     (
      select *
        from ABT_Budget as o
       where InsertDate = (
                           select max(InsertDate)
                             from ABT_Budget
                            where ProjID = o.ProjID
                          )
     )
   select a.*,
          sum(b.AnnualDossier) as Dossier
     from Proj_List as a
left join Proj_Budget as b
       on a.ProjID == b.ProjID
 group by a.ProjID
 order by a.Project_Manager
;

(7) By jose isaias cabrera (jicman) on 2021-07-17 23:42:18 in reply to 6.0 [link] [source]

And it's also a great teaching tool for me. :-) Now I know that I can do this with a bunch of queries that are written with lots of weaknesses. I also was able to understand your syntax and setup of the various pieces of the puzzle. Thanks again for your wonderful work on this forum.

josé