Typos in SQLite source code

(1) By Josh (jsoref) on 2020-07-29 11:58:54 [link] [source]

I have a tool that makes it relatively easy to identify likely typos.

I'm open to providing a patch (series) to fix such typos, but as there are quite a few, I'd rather avoid doing the work if it'll be rejected offhand.

The first four are probably notable:

abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwyxz
abcdefghijklmnopqrstuvwyz

Thoughts:

I left the first bullet because many readers (myself included) would otherwise miss the oddities when seeing only one of the other bulleted items.
the second bullet seems fine and is similarly retained.
It's unclear to me why one would transpose two letters in that sequence (the third bullet) -- it's possible this is to avoid some optimization, so I'm not asserting that it's in fact wrong (if it's right, I think a comment explaining it would be helpful).
Afaict, the last of those appears to lead to an unexpected behavior involving an array.

In some cases, I've excluded spellings that I think are dominant, in other cases, I've left both potentially correct spellings in the list.

e.g. I selected pass-thrus over pass-throughs, although both should be replaced with one of passthrough or pass-thru (+s). The tool ignores the pass- portion of the word and just flags the second part.

Similarly there's a bit of a mix of highlite and highlight -- the latter is in dictionaries and thus should be preferred.

Another example involves references to Gutman's paper -- note the readme cites Atonin [sic; s.b. Antonin] Guttman.

Some items are probably technically abbreviations, but which I feel serve little purpose, e.g. Ellips where adding an extra letter shouldn't hurt anyone.

Numbers. English is a messy language: fourtynine should be written forty-nine...

The tool relies on some heuristics and occasionally misparses things (it has to try to decide if \n or \r are escapes or paths (it defaults to the former, which means it will make mistakes when it's wrong).

Conversely, it doesn't natively have a rule for %s which means it would naively complain about %spage_size -- I've tried to adjust for that here. I suspect I managed to overcorrect which means it's probably complaining about %string as if it were tring (I've included a suppression for that).

One item that is currently excluded (to reduce false-positives, but which is thus a false-negative):

Omiting

A handy way to look for these items (using the git repository, although one should be able to adapt it for fossil by swapping git ls-files for fossil ls):

files=$(git ls-files | egrep -v $(echo $(cat .github/actions/spelling/excludes.txt )|tr ' ' '|'))
patterns=$(perl -ne 'next if /^#/;next unless /./;print' .github/actions/spelling/patterns.txt |tr "\n" "|"|perl -pne 's/\|$//')
peek() { echo "$files"|tr "\n" "\0" | xargs -0 grep "$1"|perl -pne 's{'"$patterns"'}{}g' |uniq | grep --color=always "$1"; }

peek defaut:

src/build.c: const char *zEnd /* First character past end of defaut value text */

src/expr.c:** defautl collation sequence.

Can someone explain the best way forward/a likely way forward?

Thanks

(2) By Richard Hipp (drh) on 2020-07-29 16:22:37 in reply to 1 [source]

I put in hours of work making all the changes, now on a branch. What I found is that none of them seem to add much value to the code. But they are disruptive in that they make changes to files that have been otherwise unchanged for many years, and they seem likely to complicate merging and change tracking in the further. So the patch is now on a branch and my thinking at the moment is that it will likely stay there, never landing on trunk.

(3) By Andreas Kupries (andreas-kupries) on 2020-07-29 16:49:11 in reply to 2 [link] [source]

A compromise and intermediate way of fixing things might be to

apply the tool to all newly added/changed files as sqlite development goes forward and
fix all the typos it found in these files.

That way we should have less new typos going forward, and old typos are incrementally expunged as the files they are in get modified.

That should leave only the typos in files truly not changed for many years.

(4) By Andreas Kupries (andreas-kupries) on 2020-07-29 16:59:35 in reply to 1 [link] [source]

Is this tool open somewhere ? If yes, what is its license ?

(5) By Larry Brasfield (LarryBrasfield) on 2020-07-29 19:59:42 in reply to 3 [link] [source]

This makes a lot of sense. Good spelling in code comments is much like other whitespace conventions -- not worth an independent file change but harmless with slight benefit when made part of the check-in process.

The main (and slight) benefit, IMO, is reduction in "Typo found" forum posts. In my reading of SQLite code, I have found the comments to be quite useful with minimal distraction from spelling errors and hardly any significant grammar problems that impede comprehension. The imperfections are insignificant, and so I have never even considered reporting them.

(6) By anonymous on 2020-07-29 20:15:38 in reply to 4 [link] [source]

Maybe it's this: https://github.com/jsoref/spelling

(7) By Andreas Kupries (andreas-kupries) on 2020-07-29 20:28:52 in reply to 6 [link] [source]

Looks to be. Thank you.

(8) By Josh (jsoref) on 2020-07-30 00:39:23 in reply to 7 [link] [source]

Technically the fancy version is here: https://github.com/check-spelling/check-spelling but it's tied to GitHub Actions. If you're going to do something special, using the original tool should be good enough.

I'd be happy to help adapt it, but I suspect you'll be ok.

Thanks for listening. And thanks for sqlite

(9) By Josh (jsoref) on 2020-07-30 17:10:36 in reply to 2 [link] [source]

Wow, that's fast. It would have taken me much longer.

Fwiw, you can see the tool's output with that merged.

As to value, I definitely appreciate the apparent lack...

I should call out that first bullet set -- abcdefghijklmnopqrstuvwyz:

** Return values of columns for the row at which the vtablog_cursor
** is currently pointing.
...
static int vtablogColumn(
...
  int i                       /* Which column to return */
){
...
  if( i<26 ){
    sqlite3_snprintf(sizeof(zVal),zVal,"%c%d", 
                     "abcdefghijklmnopqrstuvwyz"[i], pCur->iRowid);

let i=25 (i<26):

> 'abcdefghijklmnopqrstuvwyz'[25]
undefined

This seems unlikely to be the desired behavior. (In C, that char isn't undefined, it's null, but that's probably worse.)