SQLite Forum

Lemon - Unexpected frees
Login

Lemon - Unexpected frees

(1.1) By Sir Humphrey Appleby (SirHumphreyAppleby) on 2021-04-03 02:21:23 edited from 1.0 [link] [source]

I am trying to use lemon to parse some simple rules and JSON. I assume I must be doing something fundamentally incorrect, but I haven't been able to determine what from the documentation. I would appreciate it if someone could assist and tell me what I'm doing wrong.

The rule processing appeared to function when I originally wrote it a few months ago, but didn't work when I tried to use the same code in a new project. I no longer have the original build environment to compare.

I am seeing similar oddities in a much simpler JSON parser I implemented using a similar technique. Specifically, while this simple example doesn't fail, I am seeing multiple attempts to free tokens, first using the token_destructor and then the default_constructor.

Parsing the simple string "{}\n", I see the following when I log memory allocations, token assignments/types and frees...

Alloc: 0x8005b0010 Bytes: 2 (json_parse)
Token: 0x8005b0010 Value: { Type: 2
Alloc: 0x8005b0018 Bytes: 2 (json_parse)
Token: 0x8005b0018 Value: } Type: 3
Alloc: 0x8005b0020 Bytes: 2 (json_parse)
Token: 0x8005b0020 Value: 
 Type: 1
Free: 0x8005b0010 { (token_destructor)
Free: 0x8005b0018 } (token_destructor)
Free: 0x8005b0010 { (default_destructor)
Free: 0x8005b0010 { (default_destructor)
Free: 0x8005b0020 
 (token_destructor)

Note, the address 0x8005b0010 is assigned once and freed three times.

The lemon .y file is as follows, processed using "lemon json_grammar.y" and uses lempar.c from 2 April:

%name JSON_Parse
%start_symbol json
%token_prefix JSON_
%token_type { uint8_t* }
%token_destructor { json_tok_free($$, "token_destructor"); }
%default_destructor { json_tok_free($$, "default_destructor"); }

%include {
#include <assert.h>
#include "json_common.h"
#include "lib.h"
}

%extra_argument { JSONParserStruct *ps }

%syntax_error {
    seterrormsg(&ps->error, "Syntax error, line %d", ps->scanner->line);
}

// json
json ::= json_endgame END_TOKEN.

// json_endgame
json_endgame ::= json_object. { json_finalise(ps); }
json_endgame ::= json_array. { json_finalise(ps); }

// json_object
json_object ::= LBRACE RBRACE. { json_push_group(ps, JSON_OBJECT, 0); }
json_object ::= LBRACE json_object_values RBRACE. { json_push_group(ps, JSON_OBJECT, 1); }

// json_object_values
json_object_values ::= json_assignment.
json_object_values ::= json_object_values COMMA json_assignment. { json_group_values(ps); }

// json_assignment
json_assignment ::= STRING(name) COLON json_value. { json_assign(ps, name); }

// json_array
json_array ::= LBRACKET RBRACKET. { json_push_group(ps, JSON_ARRAY, 0); }
json_array ::= LBRACKET json_array_values RBRACKET. { json_push_group(ps, JSON_ARRAY, 1); }

// json_array_values
json_array_values ::= json_value.
json_array_values ::= json_array_values COMMA json_value. { json_group_values(ps); }

// json_value
json_value ::= json_object.
json_value ::= json_array.
json_value ::= STRING(value). { json_push_value(ps, JSON_STRING, value); }
json_value ::= NUMBER(value). { json_push_value(ps, JSON_NUMBER, value); }
json_value ::= BOOLEAN(value). { json_push_value(ps, JSON_BOOLEAN, value); }
json_value ::= NULL(value). { json_push_value(ps, JSON_NULL, value); }

With ParserTrace on...

Alloc: 0x8005b0010 Bytes: 2 (json_parse)
Token: 0x8005b0010 Value: { Type: 2
parser >> Input 'LBRACE' in state 0
parser >> Shift 'LBRACE', go to state 4
parser >> Return. Stack=[LBRACE]
Alloc: 0x8005b0018 Bytes: 2 (json_parse)
Token: 0x8005b0018 Value: } Type: 3
parser >> Input 'RBRACE' in state 4
parser >> Shift 'RBRACE', pending reduce 2
parser >> Return. Stack=[LBRACE RBRACE]
Alloc: 0x8005b0020 Bytes: 2 (json_parse)
Token: 0x8005b0020 Value: 
 Type: 1
parser >> Input 'END_TOKEN' with pending reduce 2
parser >> Reduce 2 [json_object ::= LBRACE RBRACE], pop back to state 0.
Free: 0x8005b0010 { (token_destructor)
Free: 0x8005b0018 } (token_destructor)
parser >> ... then shift 'json_object', pending reduce 0
parser >> Reduce 0 [json_endgame ::= json_object], pop back to state 0.
Free: 0x8005b0010 { (default_destructor)
parser >> ... then shift 'json_endgame', go to state 10
parser >> Shift 'END_TOKEN', go to state 9
parser >> Return. Stack=[json_endgame END_TOKEN]
parser >> Input '$' in state 9
parser >> Reduce 13 [json ::= json_endgame END_TOKEN] without external action, pop back to state 0.
Free: 0x8005b0010 { (default_destructor)
Free: 0x8005b0020 
 (token_destructor)
parser >> ... then shift 'json', pending reduce -2
parser >> Accept!

(2) By Sir Humphrey Appleby (SirHumphreyAppleby) on 2021-04-05 21:58:38 in reply to 1.1 [link] [source]

While I was confident there was no overwriting of data when extracting tokens, I removed re2c and all dynamic memory allocation from the picture altogether. A stack is allocated, but nothing is pushed to the stack in this simple example. Obviously I'm not freeing static values, but I am seeing multiple calls with the same address to the destructor functions even with this simple example.

#include <stdio.h>
#include "../mysrc/json_common.h"
#include "../mysrc/lib.h"

void *JSON_ParseAlloc();
void JSON_Parse();
void JSON_ParseFree();

int main() {
    void *parser;
    struct JSONParserStruct ps;

    parser = JSON_ParseAlloc(malloc);
    memset(&ps, 0, sizeof(struct JSONParserStruct));
    ps.stack = malloc(JSON_STACK_SIZE);

    JSON_Parse(parser, JSON_LBRACE, "{", &ps);
    JSON_Parse(parser, JSON_RBRACE, "}", &ps);
    JSON_Parse(parser, JSON_END_TOKEN, "", &ps);
    JSON_Parse(parser, 0, 0, &ps);
    JSON_ParseFree(parser, free);
}

(3) By Richard Hipp (drh) on 2021-04-06 10:14:01 in reply to 1.1 [link] [source]

I'm sorry - I do not know what is going wrong with Lemon or your usage thereof.

To maximize the chance of getting help, I suggest you write a short standalone Lemon source file that demonstrates your problem. By "standalone" I mean:

  • Does not make use of your private infrastructure ("json_common.h" etc.)
  • Includes a tokenizer (or alternative input source) as part of the script, in a %code{} block.
  • Is something that we can just scrape from Forum into a file, run through Lemon, and then compile in order to see the malfunction for ourselves.

The information you have provided is a hint of the problem, but it does not demonstrate the problem. Significant work is required in order to decode your hint and figure out what is going wrong. If you want people to help you, you will stand a better chance if you reduce the barrier to entry by making it easy to reproduce the problem you are seeing.

(4) By Sir Humphrey Appleby (SirHumphreyAppleby) on 2021-04-06 18:30:39 in reply to 3 [source]

Thanks Richard. I assumed my understanding was fundamentally flawed and the issue would be apparent from the grammar rules. I have stripped everything out to create a simple example which shows the behaviour I am seeing.

// grammar.y

%start_symbol json
%token_type { uint8_t* }
%token_destructor { printf("Free: %p token_destructor\n", $$); }
%default_destructor { printf("Free: %p default_destructor\n", $$); }

%include {
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
}

%code {
void *ParseAlloc();
void Parse();
void ParseFree();

int main() {
    void *parser;

    parser = ParseAlloc(malloc);
    Parse(parser, LBRACE, "{");
    Parse(parser, RBRACE, "}");
    Parse(parser, END_TOKEN, "");
    Parse(parser, 0, 0);
    ParseFree(parser, free);
}
}

// json
json ::= json_endgame END_TOKEN.

// json_endgame
json_endgame ::= json_object. { }

// json_object
json_object ::= LBRACE RBRACE. { }



// Observed output

Free: 0x200630 token_destructor
Free: 0x20085e token_destructor
Free: 0x200630 default_destructor
Free: 0x200630 default_destructor
Free: 0x200a92 token_destructor