Division by decimals wrongly returning a rounded int result (works fine when casting to real)
(1) By Thiago Bellini Ribeiro (bellini) on 2021-06-18 14:37:33 [source]
Hey guys,
I just had this issue where a division by 2 decimals is rounding some numbers if they don't have a floating point in the actual value. Let me demonstrate the issue:
Here is the schema for my table:
sqlite> .schema product_product
CREATE TABLE IF NOT EXISTS "product_product" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "created_at" datetime NOT NULL, "updated_at" datetime NOT NULL, "name" varchar(255) NOT NULL, "price" decimal NOT NULL, "quantity" decimal NOT NULL, "depth" decimal NULL, "height" decimal NULL, "weight" decimal NULL, "width" decimal NULL, "freight_help" decimal NOT NULL, "ref" char(32) NOT NULL UNIQUE, "erp_id" varchar(255) NULL UNIQUE, "erp_synced_at" datetime NULL, "unit" varchar(2) NOT NULL, "is_active" bool NOT NULL, "kind" varchar(9) NOT NULL, "original_price" decimal NULL, "coffee_kind" varchar(7) NULL, "producer_id" integer NULL REFERENCES "product_producer" ("id") DEFERRABLE INITIALLY DEFERRED, "region_id" integer NULL REFERENCES "product_region" ("id") DEFERRABLE INITIALLY DEFERRED, "variety_id" integer NULL REFERENCES "product_variety" ("id") DEFERRABLE INITIALLY DEFERRED, "altitude" integer NULL, "characteristics" varchar(255) NULL, "score" integer NULL, "olfactory" varchar(255) NULL, "taste" varchar(255) NULL, "visual" varchar(255) NULL, "is_fresh" bool NOT NULL, "is_featured" bool NOT NULL, "description" text NOT NULL, "sca_report" varchar(2000) NULL, "stock" integer NULL, "slug" varchar(50) NOT NULL);
CREATE INDEX "product_product_is_active_79c22e04" ON "product_product" ("is_active");
CREATE INDEX "product_product_producer_id_48defcb2" ON "product_product" ("producer_id");
CREATE INDEX "product_product_region_id_8af442d6" ON "product_product" ("region_id");
CREATE INDEX "product_product_variety_id_899ca5d0" ON "product_product" ("variety_id");
CREATE INDEX "product_product_is_fresh_8e90cb7e" ON "product_product" ("is_fresh");
CREATE INDEX "product_product_is_featured_27b90897" ON "product_product" ("is_featured");
CREATE INDEX "product_product_slug_76cde0ae" ON "product_product" ("slug");
I want to select the price, original price and price / original_price (both columns of decimal
type). Here is what I get:
sqlite> SELECT price, original_price, price / coalesce(original_price, price) FROM product_product;
18.9||1.0
24.9||1.0
24.9||1.0
39.9|50|0.798
20||1
29.9||1.0
34.7||1.0
20||1
20|30|0
15|34|0
Notice that even though 39.9 / 50 gave me 0.798, 20 / 30 and 15 / 34 gave me 0. And as you can see on my schema, they are indeed decimals and not integers.
I did a test by forcing price to be casted to numeric (I tried decimal too):
sqlite> SELECT price, original_price, cast(price as numeric) / coalesce(original_price, price) FROM product_product;
18.9||1.0
24.9||1.0
24.9||1.0
39.9|50|0.798
20||1
29.9||1.0
34.7||1.0
20||1
20|30|0
15|34|0
Same result. The only way I got the real result from this was when I casted price to real:
sqlite> SELECT price, original_price, cast(price as real) / coalesce(original_price, price) FROM product_product;
18.9||1.0
24.9||1.0
24.9||1.0
39.9|50|0.798
20||1.0
29.9||1.0
34.7||1.0
20||1.0
20|30|0.666666666666667
15|34|0.441176470588235
AFAIK, even the result without cast should be working and giving the right result, right?
Btw, I'm using sqlite3 from the current debian unstable build:
❯ sqlite3 --version
3.34.1 2021-01-20 14:10:07 10e20c0b43500cfb9bbc0eaa061c57514f715d87238f4d835880cd846b9ealt1
(2.1) By Larry Brasfield (larrybr) on 2021-06-18 15:02:30 edited from 2.0 in reply to 1 [link] [source]
Your expectations need some adjustment to incorporate the fact that there is no "decimal" type in SQLite. I suggest a study of Affinity of expressions, at least. Better would be the whole page.
(Later addition in light of Richard's reply:)
When you write "decimal" (or "DECIMAL" or other casings thereof) as the column type, it resolves to NUMERIC per the 5th rule in section 2 of the doc we linked. There is no decimal type; I suspect Richard meant to say that values are stored into a NUMERIC column using an integer representation when they can be.
(3) By Richard Hipp (drh) on 2021-06-18 14:51:59 in reply to 1 [link] [source]
And as you can see on my schema, they are indeed decimals and not integers.
See https://www.sqlite.org/datatype3.html#affinity_name_examples. SQLite stores integers in a DECIMAL column when it can. To force floating-point values, you want to use REAL.
Also be mindful of the limitations of floating point numbers. Depending on your requirements, you might want to store values as in integer number of cents, then convert to dollars (or whatever currency you are using) in the application code.
(4) By Thiago Bellini Ribeiro (bellini) on 2021-06-18 15:00:28 in reply to 2.0 [link] [source]
Hi Larry.
Yes, I know that there's no decimal type on SQLite and that some types are actually aliases to others.
But looking at the link you gave me I'm surprised to know that DECIMAL is stored as NUMERIC instead of REAL. Is there any motivations in that? I would expect decimal to be something like real/float/double/etc.
Note that the only reason my column is decimal and not numeric is because it was generated by Django. In my django app I use sqlite for dev and postgresql for production, and I declare monetary fields as DecimalField, which translates to decimals here in sqlite. There's no RealField ASAIK and using FloatField (which would translate to REAL) is not an option because I would loose precision on postgresql.
(5) By Richard Hipp (drh) on 2021-06-18 15:07:34 in reply to 4 [link] [source]
You can perhaps argue that DECIMAL should translate into floating-point instead of numeric. But the time to have made that argument was 2004. That's now all water under the bridge. To convert DECIMAL to be floating point now, 17 years later, would be a breaking API change that could potential cause problems countless legacy applications.
(6) By Larry Brasfield (larrybr) on 2021-06-18 15:11:56 in reply to 4 [link] [source]
Hi, Thiago.
I think the idea behind that set of rules is to map the numerous typenames used in various SQL dialects to something reasonable. Of course, as your post shows, thinking on what that should mean varies.
Due to backward compatibility considerations, the present mapping is highly unlikely to change. So today's problem is how to adapt your generated SQL to SQLite's dynamic typing scheme.
While you do that, you will probably want to consider the decimal extension. I expect you will need some SQL transformation specialized for SQLite, and that extension may provide a useful part of such a solution.
Good luck.
(7) By Thiago Bellini Ribeiro (bellini) on 2021-06-18 15:17:12 in reply to 5 [link] [source]
Yeah, I get that this would be a breaking API change now...
Thank you both (drh and larrybr) for helping me figure out this.
I'll open an issue on Django regarding this, probably their Decimal field should be translating to numeric then.
(8) By Ryan Smith (cuz) on 2021-06-18 17:09:42 in reply to 1 [link] [source]
Hi Thiago, there's been good and accurate replies to this post, showing what is correct and why.
However, on the off chance you were just interested in a solution, consider the following sql where both table t's columns are INTeger (not even Decimal or Numeric) and see what works and do not work when forcing the calculation types:
CREATE TABLE t(a INT, b INT);
INSERT INTO t VALUES
(10,1)
,(10,2)
,(10,3)
,(10,4)
,(10,5)
,(10,6)
;
SELECT a / b FROM t; -- Integer division only
-- a / b
-- -----
-- 10
-- 5
-- 3
-- 2
-- 2
-- 1
SELECT CAST(a / b AS REAL) FROM t; -- Not useful
-- CAST(a / b AS REAL)
-- ---------
-- 10.0
-- 5.0
-- 3.0
-- 2.0
-- 2.0
-- 1.0
SELECT (CAST(a AS REAL) / b) FROM t; -- Casting one or both values
-- (CAST(a AS REAL) / b)
-- --------------------
-- 10.0
-- 5.0
-- 3.33333333333333
-- 2.5
-- 2.0
-- 1.66666666666667
SELECT ((a * 1.0) / b) FROM t; -- Using another calculation to force float
-- ((a * 1.0) / b)
-- --------------------
-- 10.0
-- 5.0
-- 3.33333333333333
-- 2.5
-- 2.0
-- 1.66666666666667
SELECT (a / (b * 1.0)) FROM t; -- etc.
-- (a / (b * 1.0))
-- --------------------
-- 10.0
-- 5.0
-- 3.33333333333333
-- 2.5
-- 2.0
-- 1.66666666666667
Perhaps an adjusted query is all you need, if other fixes are off the table / slow.
(9) By Thiago Bellini Ribeiro (bellini) on 2021-06-18 17:50:30 in reply to 8 [link] [source]
Hey Ryan,
Thanks for the reply!
Yeah, I notice that some casts or even some multiplications helped on sql directly, but I can't even do some of those castings on django itself. I ended up opening an issue there so they can maybe workaround this issue.
The issue here: https://code.djangoproject.com/ticket/32861#ticket
(10) By Ryan Smith (cuz) on 2021-06-18 18:02:23 in reply to 9 [link] [source]
Seriously?
Django won't let you state a query like this:
SELECT (a * 1.0) / b FROM t;
?
If so, invest your time in another framework asap.
(11) By Scott Robison (casaderobison) on 2021-06-18 20:19:18 in reply to 10 [link] [source]
I think the issue is that Django abstracts away the database access completely so that it is not a concern of the programmer, so you don't really have the option of telling it what you want it to do with this or that database engine.
(12) By Thiago Bellini Ribeiro (bellini) on 2021-06-18 21:22:49 in reply to 11 [link] [source]
Yeah, exactly that.
But I can actually so some "advanced stuff" like that. For example, I can make that multiplication that you described like this:
T.objects.annotate(
foo=(Value(Decimal("1.0")) * F("a")) / F("b"),
)
This translates exactly to your suggestion, but based on the fact that django abstracts it I would not expect to need to cast anything to decimal since my columns are already decimal (which I know understand maps to NUMERIC and not REAL). If I was using postgresql/mysql/etc (which I use in production, postgresql), I would not need that workaround.
Note that I use DecimalField (https://docs.djangoproject.com/en/3.2/ref/models/fields/#decimalfield) instead of FloatField on django because it uses Decimal on python and decimal types on postgresql, making sure I don't run into floating point rounding issues.
Also, the documentation on django seems wrong since they say that DecimalField uses REAL for sqlite (https://docs.djangoproject.com/en/3.2/ref/databases/#sqlite-decimal-handling). So, or this doc is wrong or the implementation is wrong, something that I'll also write now on the issue I opened there.
(13) By Ryan Smith (cuz) on 2021-06-19 00:40:32 in reply to 12 [link] [source]
I'm not sure I ever advised casting to decimal, I did say casting to REAL is one possible workaround, which I'm sure would work fine in Postgres but might not in MySQL.
That said, I agree you do not want to have unneeded code like that ending up in your production environment.
May I ask why you are not using the intended production DB while developing? It's very trivial to install MySQL or Postgres on a local machine, or some networked PC standing around the office - even on Windows machines.
I mean I'm obviously a fan of SQLite myself, but I use it in production environments mostly. I can't imagine using it as a stand-in for PG or MySQL when the eventual intent is to use one of those. What would be the rationale for that, other than having an extremely crappy dev machine?
(14) By Thiago Bellini Ribeiro (bellini) on 2021-06-20 13:03:22 in reply to 13 [link] [source]
Yes, to cast to REAL using the code that I provided I would have to use FloatField, which on postgres would really convert to float and make me lose precision.
And I actually use postgres on my local dev machine, but I added the possibility of using sqlite for some coworkers that are frontend only and don't want to setup any database locally, they just want the backend "to work" for they to do their job. And since django abstracts the db for the db stuff, it usually works. I have some ifs/elses here and there for things like full text search and stuff, but this issue specifically got me off guard.