Reporting performance issue
(1) By Jinho Jung (jinhojun) on 2021-02-23 14:33:03
# Performance issues discovered from differential test Hello. We are studying DBMS from GeorgiaTech and reporting interesting queries that potentially show performance problems. To discover such cases, we used the following procedures: * Install four DBMSs with latest version (PostgreSQL, SQLite, MySQL, CockroachDB) * Import TPCC-C benchmark for each DBMS * Generate random query (and translate the query to handle different dialects) * Run the query and measure the query execution time * Remove `LIMIT` to prevent any non-deterministic behaviors * Discard the test case if any DBMS returned an error * Some DBMS does not show the actual query execution time. In this case, query the `current time` before and after the actual query, and then we calculate the elapsed time. In this report, we attached a few queries. We believe that there are many duplicated or false-positive cases. It would be great if we can get feedback about the reported queries. Once we know the root cause of the problem or false positive, we will make a follow-up report after we remove them all. * Query files link: ```sh wget https://gts3.org/~jjung/report1/sq.tar.gz ``` * Execution result (execution time (second)) | Filename | Postgres | Mysql | Cockroachdb | Sqlite | Ratio | |---------:|---------:|---------:|------------:|---------:|---------:| | 1753 | 0.00421 | 0.013 | 0.02623 | 4.639 | 1102.40 | | 602 | 0.00194 | 0.004 | 0.01218 | 2.497 | 1286.32 | | 1948 | 0.00168 | 0.001 | 0.00356 | 1.912 | 1912.14 | | 2064 | 0.00117 | 0.0009 | 0.00202 | 2.271 | 2523.33 | | 1009 | 0.00194 | 0.001 | 0.03355 | 2.783 | 2782.54 | | 389 | 0.00112 | 0.0009 | 0.00193 | 2.375 | 2638.89 | | 4293 | 0.00204 | 0.006 | 0.02212 | 2.573 | 1263.25 | | 2121 | 0.00171 | 0.0009 | 0.00594 | 2.515 | 2794.44 | | 4576 | 0.00168 | 0.005 | 0.0203 | 3.447 | 2050.46 | | 1140 | 0.00148 | 0.005 | 0.00998 | 2.698 | 1823.14 | | 3580 | 0.00132 | 0.001 | 0.0016 | 2.817 | 2817.20 | | 2973 | 0.00202 | 0.005 | 0.01454 | 2.747 | 1362.71 | | 1642 | 0.00243 | 0.01 | 0.00597 | 2.537 | 1046.10 | | 1539 | 0.00254 | 0.004 | 0.00563 | 2.569 | 1009.48 | | 2187 | 0.0121 | 0.054 | 0.05258 | 5.036 | 416.17 | | 4475 | 0.00163 | 0.005 | 0.01066 | 2.822 | 1735.53 | | 3733 | 0.00109 | 0.006 | 0.00729 | 2.451 | 2248.52 | | 3805 | 0.00257 | 0.001 | 0.0209 | 3.735 | 3735.27 | | 1292 | 0.00142 | 0.003 | 0.02727 | 1.252 | 884.20 | | 2799 | 0.00227 | 0.005 | 0.05051 | 6.286 | 2770.35 | | 2070 | 0.00196 | 0.005 | 0.01153 | 2.567 | 1306.49 | | 4315 | 0.00193 | 0.0009 | 0.07619 | 2.779 | 3087.78 | | 1159 | 0.00148 | 0.0009 | 0.00535 | 2.503 | 2781.11 | | 2214 | 0.00106 | 0.005 | 0.00967 | 1.386 | 1303.72 | | 2233 | 0.00164 | 0.005 | 0.01079 | 2.925 | 1786.82 | | 2171 | 2.75006 | 0.285 | 0.64789 | 3.429 | 12.03 | | 4383 | 0.00252 | 0.007 | 0.01062 | 2.771 | 1099.57 | | 325 | 0.00154 | 0.004 | 0.00882 | 2.292 | 1489.29 | | 2583 | 0.00219 | 0.001 | 0.01203 | 2.923 | 2922.51 | | 1027 | 0.00186 | 0.004 | 0.00663 | 2.311 | 1243.81 | | 641 | 0.00189 | 0.004 | 0.00585 | 1.367 | 724.12 | | 185 | 0.68682 | 0.0009 | 0.04701 | 1.553 | 1725.56 | | 4336 | 0.00105 | 0.011 | 0.01241 | 2.512 | 2387.51 | | 2680 | 0.00143 | 0.006 | 0.03904 | 5.988 | 4187.31 | | 1132 | 0.0849 | 0.001 | 0.24651 | 1.423 | 1422.76 | | 597 | 0.00252 | 0.005 | 0.00607 | 2.55 | 1013.02 | | 3315 | 0.00151 | 0.006 | 0.01007 | 2.699 | 1783.87 | | 3698 | 0.00107 | 0.006 | 0.0069 | 2.49 | 2333.81 | | 207 | 0.00148 | 0.004 | 0.00992 | 2.498 | 1691.26 | | 3714 | 0.00165 | 0.006 | 0.01072 | 3.975 | 2406.17 | | 1990 | 0.00144 | 0.126 | 0.44784 | 1.174 | 814.17 | | 1858 | 0.00175 | 0.02 | 0.04427 | 5.774 | 3303.04 | | 3741 | 0.00156 | 0.005 | 0.01082 | 2.677 | 1719.21 | | 4369 | 0.04095 | 0.0009 | 0.5444 | 1.355 | 1505.56 | | 1657 | 0.01819 | 0.001 | 0.21661 | 1.028 | 1028.07 | | 4580 | 0.00149 | 0.006 | 0.01062 | 2.772 | 1860.56 | | 432 | 0.00115 | 0.005 | 0.00669 | 2.332 | 2034.77 | | 2333 | 0.00173 | 0.005 | 0.00685 | 2.365 | 1367.83 | | 2498 | 0.00185 | 0.0009 | 0.01235 | 3.995 | 4438.89 | | 999 | 0.00102 | 0.005 | 0.00596 | 2.314 | 2257.65 | | 2228 | 0.00129 | 0.006 | 0.01223 | 1.428 | 1103.64 | | 531 | 0.00153 | 0.0009 | 0.00501 | 2.468 | 2742.22 | | 3438 | 0.00128 | 0.02 | 0.02662 | 5.674 | 4439.19 | ---- # Reproduce: install DBMSs, import TPCC benchmark, run query ### Cockroach (from binary) ```sh # install DBMS wget https://binaries.cockroachdb.com/cockroach-v20.2.5.linux-amd64.tgz tar xzvf cockroach-v20.2.5.linux-amd64.tgz sudo cp -i cockroach-v20.2.5.linux-amd64/cockroach /usr/local/bin/cockroach20 sudo mkdir -p /usr/local/lib/cockroach sudo cp -i cockroach-v20.2.5.linux-amd64/lib/libgeos.so /usr/local/lib/cockroach/ sudo cp -i cockroach-v20.2.5.linux-amd64/lib/libgeos_c.so /usr/local/lib/cockroach/ # test which cockroach20 cockroach20 demo # start the DBMS (to make initial node files) cockroach20 start-single-node --insecure --store=node20 --listen-addr=localhost:26259 --http-port=28080 --max-sql-memory=1GB --background # quit cockroach20 quit --insecure --host=localhost:26259 # import DB mkdir -p node20/extern wget https://gts3.org/~jjung/tpcc-perf/tpcc_cr.tar.gz tar xzvf tpcc_cr.tar.gz cp tpcc_cr.sql node20/tpcc.sql # start the DBMS again and createdb cockroach20 sql --insecure --host=localhost:26259 --execute="CREATE DATABASE IF NOT EXISTS cockroachdb;" --cockroach20 sql --insecure --host=localhost:26259 --execute="DROP DATABASE cockroachdb;" cockroach20 sql --insecure --host=localhost:26259 --database=cockroachdb --execute="IMPORT PGDUMP 'nodelocal://self/tpcc.sql';" # test cockroach20 sql --insecure --host=localhost:26259 --database=cockroachdb --execute="explain analyze select count(*) from order_line;" # run query cockroach20 sql --insecure --host=localhost --port=26259 --database=cockroachdb < query.sql ``` ### Mysql (from package manager) ```sh # install DBMS sudo apt install mysql-server sudo mysql_secure_installation sudo su mysql SHOW VARIABLES LIKE 'validate_password%'; SET GLOBAL validate_password_length=4; SET GLOBAL validate_password_policy = 0; CREATE USER 'mysql'@'localhost' IDENTIFIED BY 'mysql'; alter user 'root'@'localhost' identified by 'mysql'; # modify the conf (should add "skip-grant-tables" under [mysqld]) sudo vim /etc/mysql/mysql.conf.d # import DB wget https://gts3.org/~jjung/tpcc-perf/tpcc_my.tar.gz tar xzvf tpcc_my.tar.gz mysql -u mysql -pmysql -e "create database mysqldb" mysql -u mysql -pmysql mysqldb < tpcc_my.sql # test mysql -u mysql -pmysql mysqldb -e "show tables" mysql -u mysql -pmysql mysqldb -e "select * from customer" # run query mysql -u mysql -pmysql mysqldb < query.sql ``` ### Postgre (from SRC) ```sh # build latest postgres git clone https://github.com/postgres/postgres.git mkdir bld cd bld ../configure make -j 20 # install DBMS sudo su make install adduser postgres rm -rf /usr/local/pgsql/data mkdir /usr/local/pgsql/data chown -R postgres /usr/local/pgsql/data su - postgres /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start /usr/local/pgsql/bin/createdb postgresdb /usr/local/pgsql/bin/psql postgresdb /usr/local/pgsql/bin/createuser -s {username} /usr/local/pgsql/bin/createdb {username} /usr/local/pgsql/bin/psql =# alter {username} with superuser # import DB wget https://gts3.org/~jjung/tpcc-perf/tpcc_pg.tar.gz tar xzvf tpcc_pg.tar.gz /usr/local/pgsql/bin/psql -p 5432 -d postgresdb -f tpcc_pg.sql # test /usr/local/pgsql/bin/psql -p 5432 -d postgresdb -c "select * from warehouse" # run query /usr/local/pgsql/bin/psql -p 5432 -d postgresdb -f query.sql ``` ### Sqlite (from SRC) ```sh # build latest sqlite from src git clone https://github.com/sqlite/sqlite.git cd sqlite mkdir bld cd bld ../configure make -j 20 # install DBMS sudo make install # import DB wget https://gts3.org/~jjung/tpcc-perf/tpcc_sq.tar.gz tar xzvf tpcc_sq.tar.gz # test sqlite3 tpcc_sq.db sqlite> select * from warehouse; # run query sqlite3 tpcc_sq.db < query.sql ```
(2.1) By Dan Kennedy (dan) on 2021-02-24 17:21:02 edited from 2.0 in reply to 1 [link]
Thanks for posting these. Very interesting indeed. All of them seem to be cases where the planner should be able to determine that no work (or very little) is required. I guess that's a product of your 1000x threshold - it's hard to be 1000x slower unless you miss a really good shortcut! I think instances where one db is consistently more than (say) 10x slower would be quite interesting as well. Anyway, for the record these all seem to come down to two things: (1) SQLite is not taking advantage of NOT NULL constraints when processing IS NULL. With SQLite, if you do: CREATE TABLE t1(a NOT NULL); SELECT * FROM t1 WHERE a IS NULL; SQLite does a full-table scan. But it should be possible to determine at compile-time that any "col IS NULL" expression for which col has a NOT NULL constraint and does not come from a table on the right-hand-side of a LEFT JOIN is always false. (2) SQLite is not ignoring columns in the result-set of an EXISTS(SELECT ...) when determining whether or not the SELECT is correlated. i.e. if you do: CREATE TABLE t1(a, b); CREATE TABLE t2(x, y); SELECT * FROM t1 WHERE EXISTS (SELECT a FROM t2 WHERE x=1); then SQLite runs the sub-query once for every row of t1 because it thinks the "a" in the sub-query makes it correlated. It doesn't - the sub-query only has to be run once. Naive changes to support the two optimizations above fix most of these examples. But there are a couple of tricks. For example, query 1753 features the equivalent of: CREATE TABLE t1(a, b); CREATE TABLE t2(x, y NOT NULL); SELECT * FROM t1 LEFT JOIN t2 WHERE t1.a=t2.x AND y IS NULL; In the query, the LEFT JOIN can be transformed to a regular join (because t1.a=t2.x implies that t2.x cannot be NULL) which allows the database to determine that "y IS NULL" will always be false. But currently, by the time SQLite realizes it can transform the join it's tricky to tell exactly which LEFT JOIN (there may be many in the query) column "y" came from. There's a similar problem in 1948: CREATE TABLE t1(a, b); CREATE TABLE t2(x, y NOT NULL); CREATE TABLE t3(c, d); SELECT * FROM t1 LEFT JOIN t2 WHERE t1.a=t2.x EXISTS (SELECT 1 FROM t3 WHERE t2.y IS NULL OR t3.c=10); By the time the LEFT JOIN is transformed to a regular join, the sub-select has already been marked as correlated. So even though the "t2.y IS NULL" is transformed to "false", making the sub-select uncorrelated, SQLite still runs it once for every row visited by the query. Apparently, other database engines do better in these cases. I'm not sure how much of a rush we'll be in to fix these specific cases. Something for after 3.35.0 anyway. Thanks, Dan.
(3) By Jinho Jung (jinhojun) on 2021-02-25 18:28:15 in reply to 2.1 [link]
We appreciate your feedback on our report. From our evaluation, we have other queries that also showed a slow-down. Based on your feedback, we will apply a filter rule to remove all queries that have the same reason described above, and then we will report new cases if they are not from the same root cause. Thanks, Jinho Jung
(4) By Keith Medcalf (kmedcalf) on 2021-02-25 18:54:34 in reply to 2.1 [link]
> Naive changes to support the two optimizations above fix most of these examples. But there are a couple of tricks. For example, query 1753 features the equivalent of: > > CREATE TABLE t1(a, b); > CREATE TABLE t2(x, y NOT NULL); > SELECT * FROM t1 LEFT JOIN t2 WHERE t1.a=t2.x AND y IS NULL; > > In the query, the LEFT JOIN can be transformed to a regular join (because t1.a=t2.x implies that t2.x cannot be NULL) which allows the database to determine that "y IS NULL" will always be false. But currently, by the time SQLite realizes it can transform the join it's tricky to tell exactly which LEFT JOIN (there may be many in the query) column "y" came from. I would say that this case (and the one following) are even simpler. There is no ON condition attached to the RHS table of the left join which means that the join type cannot be an outer join and must be an inner join and this can be done even before examination of the where clause.
(5) By Jinho Jung (jinhojun) on 2021-02-28 18:13:43 in reply to 4 [link]
Dan and Keith, We appreciate your feedback! Based on the comments above, we applied a filter to remove duplicated cases where "IS NULL" or "EXISTS (" caused the slowdown. Now we have a few queries. For example, this query runs x300 slower than other DBMSs. It would be great if you can confirm this one as another unique class so that we can add this pattern to our testing machine. ```sql select ref_1.o_w_id as c0 from main.customer as ref_0 inner join main.orders as ref_1 on (ref_0.c_credit_lim is not NULL) inner join main.warehouse as ref_2 on (ref_0.c_street_2 = ref_2.w_street_1 ) where (ref_0.c_delivery_cnt is not NULL) and (ref_1.o_entry_d is not NULL); ``` * Query files link: ```sh $ wget https://gts3.org/~jjung/report2/sq.tar.gz ``` * Execution result (execution time (second)) | Filename | Postgres | Mysql | Cockroachdb | Sqlite | Ratio | QuerySize | |---------:|---------:|---------:|------------:|---------:|---------:|---------:| | 22620 | 0.03366 | 0.026 | 0.00765 | 5.0 | 653.18 | 1319 | | 9681 | 0.00792 | 0.031 | 0.05196 | 5.0 | 631.14 | 484 | | 19401 | 0.02936 | 0.085 | 0.10491 | 5.0 | 170.28 | 475 | | 12652 | 0.01508 | 0.042 | 0.04319 | 5.0 | 331.54 | 307 | | 24172 | 0.00291 | 0.015 | 0.33168 | 2.223 | 763.38 | 704 |