Limiting import of large files to specific number of lines

(1) By Joe (jmwatx) on 2020-09-29 21:10:34 [source]

If you're importing large csv/text files on a regular basis, it'd be nice to be able to limit the number of rows imported just for testing purposes. Say you only want to import the top 1000 rows of a file to see the header and what the data looks like.

Is there a built-in way to do this that I'm unaware of? It might be nice for an option in the CLI .import command, such as "--limit 1000" or something like that.

Currently I'm having to "head file -n 1000" and then have SQLite import that file, but it'd be easier if I could just limit within the CLI instead.

(2) By Roman (moskvich412) on 2020-09-29 23:02:52 in reply to 1 [link] [source]

Dear Joe,

I am not a developer, but I think that for some tests the functionality you request could be useful. However, logically, order of rows in a table is irrelevant, thus specifying first 1000 instead of explicit WHERE clause is ambiguous. You rely on the process that prepared the CSV file to make the first 1000 rows contain what you want. If you do, the same preparation step could limit the rows.

Situation is more difficult if you are loading several CSV files, which are linked with a FOREIGN KEY. Then there is no guarantee that the first 1000 rows of one table satisfy foreign constrain of the first 1000 of another table. Again, you would have to rely on the process that prepared CSV files and again, this is the process that could apply limit meaningfully.

Therefore, logically, limit must be apply at the level of producing CSV files, not at the level of their import. Same idea in SQL: LIMIT could be added at the end of SELECT, but not at the end of INSERT.

Roman

(3) By Joe (jmwatx) on 2020-09-30 00:38:39 in reply to 2 [link] [source]

The import to a table with foreign key restraints has worked for me in the past when trying to import a specific subset from the data without taking it all of the file - and in this case SQLite works perfectly.

I'm speaking more about quickly looking at data structure rather than values themselves. In this case a way to take the first N rows from the file for a quick examination might be useful. Rather than read in all 20GB of a file, you could read in a small part of the file to get an idea of what the data might look like.