sqlite DB + data file - write out all entries

(1) By mortenbo on 2021-08-17 13:13:45 [link] [source]

I have purchased a cdrom with language dictionaries, English-Danish and Danish-English that are stored in a sqlite database and an accompanying binary data file. I would like to write out all the entries and store them in a clear text file. Is there some way I could do that?

Thanks, Morten

(2) By Ryan Smith (cuz) on 2021-08-17 14:35:13 in reply to 1 [link] [source]

Probably - Depending on what "accompanying binary datafile" actually means.

Could you be more specific? Perhaps upload the DB file somewhere so we can see the format tell you how to write it out?

If you do not wish to upload it - What OS do you use? If Linux then "DB Browser for SQLite" would probably make it easy, if Windows you can try SQLiteSpeed.com or just google "sqlite db admin" or such. Another option is dumping from the CLI and sending us the schema.

Whichever you choose, we need a lot more knowledge about the DB schema and how you want to see it to form useful queries to "write it out".

(3) By mortenbo on 2021-08-17 14:50:50 in reply to 2 [link] [source]

I have uploaded them at:

http://mbjnet.dk/Engelsk.dat (data file) http://mbjnet.dk/Engelsk.gdb (sqlite DB)

I am on Linux and I have opened the DB in sqlitebrowser, but that only shows the structure of the DB not the actual entries which are in the data file.

The reason I would like to do it is that the program on the cdrom to look up translations is a rather clunky GUI. If I could get all the entries, neatly formatted over in a text file, I could make a script to look up translations much easier in a terminal.

Thanks, Morten

(4) By mortenbo on 2021-08-17 15:38:18 in reply to 3 [link] [source]

PS: I do notice that in sqlitebrowser I can export tables as csv files and thereby obtain some of the data. But that data is fragmented and should be combined in some way to form the actual dictionary entries. I don't know how to do that.

Thanks, Morten

(5) By RandomCoder on 2021-08-17 16:21:31 in reply to 3 [link] [source]

The .dat file doesn't include any obvious formats, and the database has no hints as to it's format. I'm sorry to say, unless someone here decides to reverse engineer the format, you're probably best off finding a forum for the application meant to read it and ask if anyone is aware of documentation for the file's data format.

(7) By mortenbo on 2021-08-17 17:10:23 in reply to 5 [link] [source]

Thanks, but the .dat file is handled by the database, so I can do e.g.:

sqlite> .open Engelsk.gdb

sqlite> .tables collocation_lookup1 entries2 reverse1
collocation_lookup2 info reverse2
dict_setup lookup1 sound
entries1 lookup2

SELECT * FROM lookup1; .... 355644|vinduespolerernes 355645|vinduespudsers 355646|vinduespudseren 355647|vinduespudseren ....

All the lemmas from "lookup1" are output. The English lemmas are in "reverse1" and other information about the DB are in the other tables. I suppose I need to form a query to print out an actual dictionary entry, which consists of the word to translate and its translation and iterate over all the lemmas. Could anyone help me in doing that? I am a database novice.

Thanks, Morten

(6) By Ryan Smith (cuz) on 2021-08-17 17:04:27 in reply to 3 [link] [source]

Yeah, I'm with Random on that matter.

What I've figured out is that all you need should be in the DB file (Engelsk.gdb). The .dat file contains what seems like byte streams all put together with the DB having entries that refer to the byte offset and length/size of the streams. The first streams may be pictures or so, but I'm not sure. Later there are streams that are the sound data.

Either way, the tables seem to link English and Dansk together in a way that is not immediately obvious to me. One way I've been able to get some stuff out that seems to make some sense is with this query:

-- Dansk --> English
SELECT COALESCE(DN0.word_,DN1.word_) AS Dansk_Word, group_concat(ENG.word_,', ') AS English
  FROM      entries1            AS D2E
  LEFT JOIN lookup1             AS DN0 ON DN0.entry_id_ = D2E.id_
  LEFT JOIN collocation_lookup1 AS DN1 ON DN1.entry_id_ = D2E.id_
  LEFT JOIN reverse1            AS ENG ON ENG.entry_id_ = D2E.id_
 WHERE 1
 GROUP BY D2E.id_
 ORDER BY COALESCE(DN0.word_,DN1.word_)
 LIMIT 20;

  -- Exerpt from Results:
  -- Dansk_Word         |English                                                                                                                         
  -- -------------------|-----------------------------------------------------
  -- accept             |acceptance, confirmation                                                                                                        
  -- accept             |accept, accept, accept, accept, accept, accept, accept, accept                                                                  
  -- accept             |non-acceptance, non-acceptance, non-acceptance, non-acceptance, non-acceptance, non-acceptance, non-acceptance, non-acceptance  
  -- acceptabel         |acceptable                                                                                                                      
  -- acceptabel         |acceptable to                                                                                                                   
  -- acceptabilitet     |acceptability                                                                                                                   
  -- acceptabiliteten   |NULL                                                                                                                            
  -- acceptabilitetens  |NULL                                                                                                                            
  -- acceptabilitets    |NULL                                                                                                                            
  -- acceptant          |acceptor                                                                                                                        
  -- acceptanten        |NULL                                                                                                                            
  -- acceptantens       |NULL                                                                                                                            
  -- acceptanter        |NULL

and this one for the opposite:

-- English --> Dansk
SELECT COALESCE(EN0.word_,EN1.word_) AS English_Word, group_concat(DAN.word_,', ') AS Dansk
  FROM      entries2            AS E2D
  LEFT JOIN lookup2             AS EN0 ON EN0.entry_id_ = E2D.id_
  LEFT JOIN collocation_lookup2 AS EN1 ON EN1.entry_id_ = E2D.id_
  LEFT JOIN reverse2            AS DAN ON DAN.entry_id_ = E2D.id_
 WHERE 1
 GROUP BY E2D.id_
 ORDER BY COALESCE(EN0.word_,EN1.word_)
 LIMIT 20;


  -- Exerpt from Results:
  -- English_Wo-|                                                                                              
  -- rd         |Dansk                                                                                         
  -- -----------|--------------------------------------------------------------
  -- ad-lib     |improvisation                                                                                 
  -- ad-lib     |improviseret                                                                                  
  -- ad-lib     |improvisere                                                                                   
  -- adage      |mundheld, ordsprog, talemåde                                                                  
  -- adam       |adam                                                                                          
  -- adam       |jeg aner ikke hvem han er, jeg kender ham slet ikke                                           
  -- adam       |adamsæble                                                                                     
  -- adamant    |ikke lade sig rokke, være ubøjelig                                                            
  -- adamant    |ikke lade sig rokke hvad angår, ikke lade sig rokke med hensyn til                            
  -- adamant    |ikke lade sig rokke hvad angår, ikke lade sig rokke med hensyn til                            
  -- adamant    |holde stejlt på at, være fast besluttet på at                                                 
  -- adamantly  |være absolut imod, være en absolut urokkelig modstander af                                    
  -- adams      |nul og nix, slet ingenting                                                                    
  -- adapt      |afpasse, bearbejde, indrette, indrette sig, omarbejde, tilpasse, tilpasse sig, tilrettelægge  
  -- adapt      |tilpasse til                                                                                  
  -- adapt      |indrette efter                                                                                
  -- adapt      |indrette til

etc.

This may be a sqlite DB they use, but that format is not a well-known or open format (at least not one I am aware of)

Good luck on that.

(8) By mortenbo on 2021-08-17 17:20:09 in reply to 6 [source]

That is at least useful. Thanks Ryan!

(9) By anonymous on 2024-07-07 19:40:35 in reply to 6 [link] [source]

I know it's been while, but still...

I am also trying to rip this dictionary. And no, the DB file is not enough as it only contains the headers of the entries. The actual bodies appear to be in the DAT file (those first binary streams). But I have no idea how to decipher them.

What you have got from the DB are headers to search for collocations, reverse dictionary search etc., not the actual entries.