SQLite User Forum

Is it possible to trim a TEXT field up to the first .!? or \n char?
Login

Is it possible to trim a TEXT field up to the first .!? or \n char?

(1) By Mark (m.n.summerfield) on 2025-05-14 07:44:33 [link] [source]

I want to create a view that shows the first line of a TEXT field that is likely to be multiline. Here's what I have so far:

CREATE TABLE Cards (
    cid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    Body TEXT NOT NULL
)

CREATE VIEW CardNames AS
    SELECT TRIM(LTRIM(LTRIM(SUBSTR(Body, 1, 20)), '#')) AS Name
    FROM Cards ORDER BY LOWER(Name);

The problem is that given a Body of say The Last Straw.\n\nThe yellow straw is the most common kind... the derived Name will contain a period and newlines that I don't want: in fact I'll get The Last Straw.\n\nT, when what I really want is just The Last Straw with no period or newlines. And given a Body of say This is a very long title that will just have to be trimmed I want to get back This is a very long, i.e., 19 chars since I want the whitespace trimmed.

So I want to keep the existing TRIM... but in addition make it trim the text to the first ., !, ?, or \n it finds.

Can this be done?

(2.1) By Aask (AAsk1902) on 2025-05-14 08:31:59 edited from 2.0 in reply to 1 [link] [source]

Try using INSTR to locate the cut off point; examples:

sqlite> with cte(Desc) as (values('Body of say The Last Straw.\n\nThe yellow straw is the most common kind... the derived Name will contain a period and newlines that I ...')) select substr(desc,1,instr(desc,'\n')-1) as Desc  from cte;
Desc
---------------------------
Body of say The Last Straw.
sqlite>


sqlite> with cte(Desc) as (values('Body of say The Last\n Straw.\n\nThe yellow straw is the most common kind... the derived Name will contain a period and newlines that I ...')) select substr(desc,1,instr(desc,'\n')-1)  from cte;
substr(desc,1,instr(desc,'\n')-1)
---------------------------------
Body of say The Last
sqlite>

More obvious with multiple rows:

with cte(Desc) as (values('The Last Straw.\n\nthat was the last'),('This is the final straw\nnote final')) select substr(desc,1,instr(desc,'\n')-1) from cte;
substr(desc,1,instr(desc,'\n')-1)
---------------------------------
The Last Straw.
This is the final straw

(3) By Mark (m.n.summerfield) on 2025-05-15 07:06:06 in reply to 2.1 [link] [source]

I've got a bit further but it still doesn't work.

DROP TABLE IF EXISTS Cards;
DROP VIEW IF EXISTS CardNames;

CREATE TABLE Cards (
    cid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    Body TEXT NOT NULL
);

CREATE VIEW CardNames AS
    -- SELECT SUBSTR(Name, 1, INSTR(Name, '\n')) -- SQL error
    SELECT Name -- No SQL error
    FROM (SELECT TRIM(LTRIM(LTRIM(SUBSTR(Body, 1, 20)), '#')) AS Name
            FROM Cards ORDER BY LOWER(Name));

INSERT INTO Cards (Body) VALUES (
    'The Last Straw.\n\nThe yellow straw is the most common kind');
INSERT INTO Cards (Body) VALUES (
    'This is a very long title that will just have to be trimmed');
INSERT INTO Cards (Body) VALUES (
    'This stops here! Even though there''s more');

SELECT Name from CardNames;

This outputs (where ^ stands for space at the end):

The Last Straw\n\nT
This is a very long^
This stops here! Eve

When what I want is:

The Last Straw
This is a very long
This stops here

with no trailing whitespace or .!? chars.

(4.2) By Aask (AAsk1902) on 2025-05-15 08:30:46 edited from 4.1 in reply to 3 [link] [source]

Your requirements have changed! IT appears you want to truncate at 20 characters OR at new line or punctuation characters if these occur sooner.

drop table if exists Cards;
CREATE TABLE Cards (
    cid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    Body TEXT NOT NULL
);

INSERT INTO Cards (Body) VALUES (
    'The Last Straw.\n\nThe yellow straw is the most common kind');
INSERT INTO Cards (Body) VALUES (
    'This is a very long title that will just have to be trimmed');
INSERT INTO Cards (Body) VALUES (
    'This stops here! Even though there''s more');

-- Truncate at NEWLINE ! or ?  or at 20 characters if missing them
SELECT TRIM(SUBSTR(body, 1, iif(0 != instr(body, '\n'), instr(body, '\n'), 0 != instr(body, '!'), instr(body, '!'), 0 != instr(body, '?'), instr(body, '?'), 20)-1)) as RS
FROM cards;
RS
-------------------
The Last Straw.
This is a very long
This stops here

It gets quite opaque ... consider using CASE WHEN insead of IIF as you might find it more readable.

(5.2) By Mark (m.n.summerfield) on 2025-05-15 10:31:47 edited from 5.1 in reply to 4.2 [link] [source]

Your answer put me on the right track.

I (again) slightly changed my spec so it truncates the Name at 20 chars or just before the newline if there's a newline before that or at the . ! or ? if one of these occurs before the 20 chars.

I also switched to CASE as you suggested.

Here's the result:

DROP TABLE IF EXISTS Cards;
DROP VIEW IF EXISTS CardNames;

CREATE TABLE Cards (
    cid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    Body TEXT NOT NULL
);

CREATE VIEW CardNames AS
    SELECT TRIM(SUBSTR(Body, 1,
        CASE
            WHEN 0 != INSTR(Body, CHAR(10)) THEN MIN(20, INSTR(Body, CHAR(10)) - 1)
            WHEN 0 != INSTR(Body, '.') THEN MIN(20, INSTR(Body, '.'))
            WHEN 0 != INSTR(Body, '!') THEN MIN(20, INSTR(Body, '!'))
            WHEN 0 != INSTR(Body, '?') THEN MIN(20, INSTR(Body, '?'))
            ELSE 20
        END)) AS Name FROM Cards ORDER BY LOWER(Name);

INSERT INTO Cards (Body) VALUES (
    'The Last Straw.' || CHAR(10) || CHAR(10) ||
    'yellow straw is the most common kind');
INSERT INTO Cards (Body) VALUES (
    'This is a very long title that will have to be trimmed' || CHAR(10));
INSERT INTO Cards (Body) VALUES (
    '# This stops here! Even though there''s more'); -- added leading # see below

SELECT Name, LENGTH(Name) from CardNames;

And here's the output:

|The Last Straw.|15|
|This is a very long|19|
|This stops here!|16|

Thank you!

I would also like to be able to trim leading #s from the result too.

I did this by renaming the above view to _CardNames and adding a new view:

CREATE VIEW CardNames AS SELECT TRIM(LTRIM(Name, '#')) AS Name FROM _CardNames;

(6) By Stephan Beal (stephan) on 2025-05-15 10:35:21 in reply to 5.2 [link] [source]

I (again) slightly changed my spec so it truncates the Name at 20 chars or just before the newline if there's a newline before that or at the . ! or ? if one of these occurs before the 20 chars.

Sidebar: if you're taking your input from HTML forms, it will (per the HTML spec) have CRNL line endings, which your solution does not account for. You may need an extra step to strip out the \r characters (either before or after the rest of your processing).

(7) By Bo Lindbergh (_blgl_) on 2025-05-15 11:56:04 in reply to 5.2 [link] [source]

That code assumes that the text contains at most one of the terminator characters. Try it on this example and you'll see what I mean:

Truncate here! Not here. Or here?

(8) By jchd (jchd18) on 2025-05-15 12:25:22 in reply to 7 [source]

My take:

DROP TABLE IF EXISTS Cards;
DROP VIEW IF EXISTS CardNames;


CREATE TABLE Cards (
    cid INTEGER PRIMARY KEY NOT NULL,
    Body TEXT NOT NULL
);


CREATE VIEW CardNames AS
    SELECT
      TRIM(
        SUBSTR(Body, 1,
          MIN(
            20,
            INSTR(Body || CHAR(10), CHAR(10)) - 1,
            INSTR(Body || '.', '.') - 1,
            INSTR(Body || '!', '!') - 1,
            INSTR(Body || '?', '?') - 1
          )
        )
      ) AS Name
    FROM Cards ORDER BY LOWER(Name);


INSERT INTO Cards (Body) VALUES (
    'The Last Straw.' || CHAR(10) || CHAR(10) || 'yellow straw is the most common kind'), (
    'This is a very long title that will have to be trimmed' || CHAR(10)), (
    'This stops here! Even though there''s more'), (
    'Truncate here! Not here. Or here?');


SELECT Name, LENGTH(Name) from CardNames;
Name	            LENGTH(Name)
The Last Straw	        14
This is a very long	19
This stops here	        15
Truncate here	        13

(9) By Mark (m.n.summerfield) on 2025-05-15 15:18:42 in reply to 8 [link] [source]

Thanks for your help. I've now refined it (based on what you gave me above) to what I show at the end.

I don't care about CRLF since I'm actually storing CommonMark markdown text (hence removing leading #s) and using LF only.

I would like to be able to create a single view rather than a private one (_CardNames) and a public one (CardNames) but can't see how to do that.

Anyway, here's what I've got. The "Want" at the end matches the output I get.

DROP TABLE IF EXISTS Cards;
DROP VIEW IF EXISTS CardNames;
DROP VIEW IF EXISTS _CardNames;

CREATE TABLE Cards (
    cid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    Body TEXT NOT NULL
);

CREATE VIEW CardNames AS SELECT TRIM(LTRIM(Name, '#')) AS Name
    FROM _CardNames;

CREATE VIEW _CardNames AS
    SELECT TRIM((SUBSTR(Body, 1,
                        MIN(20,
                            INSTR(Body || CHAR(10), CHAR(10)) - 1,
                            INSTR(Body || '.', '.'),
                            INSTR(Body || '!', '!'),
                            INSTR(Body || '?', '?')
                        ))))
        AS Name FROM Cards ORDER BY LOWER(Name);

INSERT INTO Cards (Body) VALUES (
    'The Last Straw.' || CHAR(10) || CHAR(10) ||
    'yellow straw is the most common kind');
INSERT INTO Cards (Body) VALUES (
    'This is a very long title that will have to be trimmed' || CHAR(10));
INSERT INTO Cards (Body) VALUES (
    '# This stops here! Even though there''s more');
INSERT INTO Cards (Body) VALUES (
    'Truncate here! Not here. Or here?');

SELECT Name, LENGTH(Name) from CardNames;

-- Want
-- 'This stops here!'
-- 'The Last Straw.' (no newlines)
-- 'This is a very long' (no trailing space even though only 19 chars)
-- 'Truncate here!'

(10) By Bo Lindbergh (_blgl_) on 2025-05-15 16:58:26 in reply to 9 [link] [source]

You can use common table expressions as internal views.

Meanwhile, here's a data-driven approach:

-- These tables are constant, so define them outside the view.

create table terminate_before (
    term text not null);

insert into terminate_before
    values
        (char(10)),
        (char(13));

create table terminate_after (
    term text not null);

insert into terminate_after
    values
        ('.'),
        ('?'),
        ('!');

create view CardNames (cid, Name) as
    with
        -- Trim leading # or whitespace and enforce the max length.
        bases (cid, base) as
            (select cid, substr(ltrim(Body,char(9,10,13)||' #'),1,20)
                from Cards),

        -- Find all terminator positions,
        -- plus an extra position after the end
        -- in case none of the terminators occur.
        terms (cid, termpos, termlen) as
            (select cid, instr(base,term), 0
                from bases,
                    terminate_before
             union all select cid, instr(base,term), length(term)
                from bases,
                    terminate_after
             union all select cid, length(base)+1, 0
                from bases),

        -- Find the leftmost terminator position for each cid,
        -- ignoring any zero values from instr calls that found nothing.
        firstterms (cid, termpos) as
            (select cid, min(termpos+termlen)
                from terms
                where termpos>0
                group by cid)

    -- Truncate and then trim trailing whitespace.
    select cid, rtrim(substr(base,1,termpos-1))
        from bases
            natural join firstterms;

(11) By Mark (m.n.summerfield) on 2025-05-15 17:15:18 in reply to 10 [link] [source]

That's amazingly clever, but too advanced for me so I wouldn't be able to maintain it. I did realise however that I didn't need two views.

Here's a view that truncates to the first newline or after the first ., !, or ?, or the first 20 chars, whichever is the shorter, then trims whitespace, then trims any leading #s, then trims any leading whitespace:

DROP TABLE IF EXISTS Cards;
DROP VIEW IF EXISTS CardNames;

CREATE TABLE Cards (
    cid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    Body TEXT NOT NULL
);

CREATE VIEW CardNames AS
    SELECT LTRIM(LTRIM(TRIM((SUBSTR(Body, 1,
                        MIN(20,
                            INSTR(Body || CHAR(10), CHAR(10)) - 1,
                            INSTR(Body || '.', '.'),
                            INSTR(Body || '!', '!'),
                            INSTR(Body || '?', '?')
                        )))), '#'))
        AS Name FROM Cards ORDER BY LOWER(Name);

INSERT INTO Cards (Body) VALUES ('The Last Straw.' || CHAR(10) || CHAR(10) || 'yellow straw is the most common kind');
INSERT INTO Cards (Body) VALUES ('This is a very long title that will have to be trimmed' || CHAR(10));
INSERT INTO Cards (Body) VALUES ('# This stops here! Even though there''s more');
INSERT INTO Cards (Body) VALUES ('Truncate here! Not here. Or here?');

SELECT Name, LENGTH(Name) from CardNames;

-- Want
-- 'This stops here!'
-- 'The Last Straw.' (no newlines)
-- 'This is a very long' (no trailing space even though only 19 chars)
-- 'Truncate here!'

(12) By Mark (m.n.summerfield) on 2025-05-16 06:28:29 in reply to 11 [link] [source]

It occurred to me that this could be done as a generated field, e.g.,

DROP TABLE IF EXISTS Cards;

CREATE TABLE Cards (
    cid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    Name TEXT AS LTRIM(LTRIM(TRIM((SUBSTR(Body, 1,
                        MIN(20,
                            INSTR(Body || CHAR(10), CHAR(10)) - 1,
                            INSTR(Body || '.', '.'),
                            INSTR(Body || '!', '!'),
                            INSTR(Body || '?', '?')
                        )))), '#')) VIRTUAL,
    Body TEXT NOT NULL
);

INSERT INTO Cards (Body) VALUES (
    'The Last Straw.' || CHAR(10) || CHAR(10) ||
    'yellow straw is the most common kind');
INSERT INTO Cards (Body) VALUES (
    'This is a very long title that will have to be trimmed' || CHAR(10));
INSERT INTO Cards (Body) VALUES (
    '# This stops here! Even though there''s more');
INSERT INTO Cards (Body) VALUES (
    '## Truncate here! Not here. Or here?');

SELECT Name, LENGTH(Name) from Cards;

-- Want
-- 'This stops here!'
-- 'The Last Straw.' (no newlines)
-- 'This is a very long' (no trailing space even though only 19 chars)
-- 'Truncate here!'

But this fails with a syntax error:

Execution finished with errors.
Result: near "LTRIM": syntax error
At line 3:
CREATE TABLE Cards (
    cid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    Name TEXT AS LTRIM

Is using a generated field for this case possible?

(13.1) By Mark Lawrence (mark) on 2025-05-16 06:37:18 edited from 13.0 in reply to 12 [link] [source]

You are missing a pair of brackets around your definition. See column-constraint for the syntax.

(14) By Mark (m.n.summerfield) on 2025-05-16 10:38:01 in reply to 13.1 [link] [source]

You're quite right, it now works great. Thanks!

Corrected:

DROP TABLE IF EXISTS Cards;

CREATE TABLE Cards (
    cid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    Name TEXT AS (LTRIM(LTRIM(TRIM((SUBSTR(Body, 1,
                        MIN(20,
                            INSTR(Body || CHAR(10), CHAR(10)) - 1,
                            INSTR(Body || '.', '.'),
                            INSTR(Body || '!', '!'),
                            INSTR(Body || '?', '?')
                        )))), '#'))) VIRTUAL,
    Body TEXT NOT NULL
);

INSERT INTO Cards (Body) VALUES (
    'The Last Straw.' || CHAR(10) || CHAR(10) ||
    'yellow straw is the most common kind');
INSERT INTO Cards (Body) VALUES (
    'This is a very long title that will have to be trimmed' || CHAR(10));
INSERT INTO Cards (Body) VALUES (
    '# This stops here! Even though there''s more');
INSERT INTO Cards (Body) VALUES (
    '## Truncate here! Not here. Or here?');

SELECT Name, LENGTH(Name) from Cards;

-- Want
-- 'This stops here!'
-- 'The Last Straw.' (no newlines)
-- 'This is a very long' (no trailing space even though only 19 chars)
-- 'Truncate here!'