SQLite Forum

FTS5 error: database disk image is malformed
Login

FTS5 error: database disk image is malformed

(1) By Adam Sowa (strixcode) on 2022-02-03 17:52:59 [source]

Hello,

I'm trying to figure out why I'm getting a malformed database just by using the very basic functionality of the FTS5 extension.

I tested the below code on versions 3.37.2 and 3.35.5 with the same result.

Run: sqlite3.exe test.db < example.sql

example.sql:


DROP TABLE IF EXISTS user_fts;
DROP TABLE IF EXISTS user;

CREATE TABLE user (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL UNIQUE,
    bio TEXT
);

CREATE VIRTUAL TABLE user_fts USING fts5(
    bio,
    content='user',
    content_rowid='id'
);

CREATE TRIGGER user_ai AFTER INSERT ON user
    BEGIN
        INSERT INTO user_fts (rowid, bio) VALUES (new.id, new.bio);
    END;

INSERT INTO user (
    name,
    bio
) VALUES
    ('user_1', 'I am a senior designer with experience in Javascript'),
    ('user_2', 'I am a web developer working on Angular'),
    ('user_3', 'I am a backend junior developer and have worked in Finance'),
    ('user_4', 'I am a senior engineer, expert in concurrent backend system');

-- Remove the first user from the FTS index
INSERT INTO user_fts (user_fts, rowid, bio) VALUES ('delete', 1, 'I am a senior designer with experience in Javascript');

-- ERROR: database disk image is malformed (11)
INSERT INTO user_fts(user_fts, rank) VALUES('integrity-check', 1);

(2) By Richard Hipp (drh) on 2022-02-03 18:34:11 in reply to 1 [link] [source]

You made the choice to manage the content separately from the FTS index by using the "content=" option. Then you added rows to the content table, and your "user_ai" trigger also added those same rows to the FTS index. The content table and the FTS index agree and so all is well.

Then you deleted one row from the FTS index, but left that row in the content table. Now the FTS index and the content table are different. That is an error. If you add:

DELETE FROM user WHERE id=1;

Then the error will go away.

The "INSERT INTO user_fts(user_fts,...) VALUES('delete',...)" statement does not reach into the content table and automatically delete the content for you. The whole point of a separable content FTS index is so that you, the programmer, have complete control over both the content and the index. So if you change one, it becomes your responsibility to make a corresponding change to the other. If you want the content and the index to stay in sync automatically, don't use a separable content table (the "content=" option) and FTS5 will handle everything for you. By using "content=", you are essentially telling FTS5 that you are taking over responsibility to keep the content and the index in sync.

(3) By Adam Sowa (strixcode) on 2022-02-04 07:28:32 in reply to 2 [link] [source]

Thank you for your answer, my minimal example doesn't really represent what I wanted to achieve so let me explain because I'd like to understand if "content=" FTS table can support that.

Basically, I need to support "soft delete" feature, that is when a row is deleted from the separate content table (user), it is not actually deleted but only marked as deleted (ie. has extra column is_deleted)


ALTER TABLE user ADD COLUMN is_deleted INTEGER DEFAULT 0;

CREATE TRIGGER user_au AFTER UPDATE ON user
WHEN (old.is_deleted <> new.is_deleted AND new.is_deleted = 1)
BEGIN
  INSERT INTO user_fts (user_fts, rowid, bio) VALUES ('delete', new.id, new.bio);
END;

This is because I don't want the "soft deleted" users to be searchable by FTS. From your answer, I conclude that this is not possible to achieve with a separate content table?

(4) By Richard Hipp (drh) on 2022-02-04 12:22:53 in reply to 3 [link] [source]

I think it is possible to have more rows in the content table than you have in the index. FTS5 only uses the content for things like finding a snippet, and when it needs to look up all of the words in an entry when the entry is being deleted from the index. So having extra entries in the content table should be harmless. It just means that you will get reports of corruption when you use the 'integrity-check' feature. If you need to use 'integrity-check', just first remote all of the deleted entries from the content table (perhaps inside a transaction) then run the 'integrity-check' and then restore those rows (perhaps using ROLLBACK).

(5) By Dan Kennedy (dan) on 2022-02-04 21:29:44 in reply to 3 [link] [source]

If you need integrity-check to work (and linear scans of the fts5 table to omit the rows with is_deleted set), you could use a view as your content table:

CREATE TABLE user (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL UNIQUE,
    bio TEXT,
    is_deleted
);

CREATE VIEW userview AS SELECT id, name, bio FROM user WHERE is_deleted=0;

CREATE VIRTUAL TABLE user_fts USING fts5(
    bio,
    content='userview',
    content_rowid='id'
);