SQLite

View Ticket
Login
Ticket Hash: accfef33172b956d4e6c01ac047e046ea02dd0a3
Title: Unicode output from the CLI subprocess not displayed correctly on Windows.
Status: Closed Type: Support_Request
Severity: Cosmetic Priority: Low
Subsystem: Other Resolution: Not_A_Bug
Last Modified: 2024-09-16 23:34:11
240.1 days ago
Created: 2024-04-29 17:44:43
1.04 years ago
Version Found In:
User Comments:
drh added on 2024-04-29 17:44:43:

Repro: Compile and run the source code below, altering the path to sqlite3.exe as appropriate for your system. This code is based on the public sample for executing a child process with redirected output.

Expected output: The Unicode text provided

Actual output: Corrupted text at the end

#include <io.h>
#include <fcntl.h>
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
#include <strsafe.h>
#define BUFSIZE 8000 // Increased to show this isn't the error
 
HANDLE g_hChildStd_IN_Rd = NULL;
HANDLE g_hChildStd_IN_Wr = NULL;
HANDLE g_hChildStd_OUT_Rd = NULL;
HANDLE g_hChildStd_OUT_Wr = NULL;
void CreateChildProcess(void);
void ReadFromPipe(void);
void ErrorExit(PCTSTR);
int _tmain(int /*argc*/, TCHAR* /*argv*/[])
{
    _setmode(_fileno(stdin), _O_U8TEXT);
    _setmode(_fileno(stdout), _O_U8TEXT);
    _setmode(_fileno(stderr), _O_U8TEXT);
    SECURITY_ATTRIBUTES saAttr;
    wprintf(L"\n->Start of parent execution.\n");
    // Set the bInheritHandle flag so pipe handles are inherited.
    saAttr.nLength = sizeof(SECURITY_ATTRIBUTES);
    saAttr.bInheritHandle = TRUE;
    saAttr.lpSecurityDescriptor = NULL;
    // Create a pipe for the child process's STDOUT.
    if (!CreatePipe(&g_hChildStd_OUT_Rd, &g_hChildStd_OUT_Wr, &saAttr, 0))
        ErrorExit(TEXT("StdoutRd CreatePipe"));
    // Ensure the read handle to the pipe for STDOUT is not inherited.
    if (!SetHandleInformation(g_hChildStd_OUT_Rd, HANDLE_FLAG_INHERIT, 0))
        ErrorExit(TEXT("Stdout SetHandleInformation"));
    // Create a pipe for the child process's STDIN.
    if (!CreatePipe(&g_hChildStd_IN_Rd, &g_hChildStd_IN_Wr, &saAttr, 0))
        ErrorExit(TEXT("Stdin CreatePipe"));
    // Ensure the write handle to the pipe for STDIN is not inherited.
    if (!SetHandleInformation(g_hChildStd_IN_Wr, HANDLE_FLAG_INHERIT, 0))
        ErrorExit(TEXT("Stdin SetHandleInformation"));
    // Create the child process.
    CreateChildProcess();
    // Close the child process's input pipe so it stops reading.
    if (!CloseHandle(g_hChildStd_IN_Wr))
        ErrorExit(TEXT("StdInWr CloseHandle"));
    // Read from pipe that is the standard output for child process.
    wprintf(L"\n->Contents of child process STDOUT:\n\n");
    ReadFromPipe();
    wprintf(L"\n->End of parent execution.\n");
    // The remaining open handles are cleaned up when this process terminates.
    // To avoid resource leaks in a larger application, close handles explicitly.
    return 0;
}

void CreateChildProcess()
// Create a child process that uses the previously created pipes for STDIN and STDOUT.
{
    TCHAR szCmdline[] = TEXT(
        "\"C:\\Program Files\\SQLite\\sqlite3.exe\" --cmd \"SELECT "
        "'aAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAA "
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        "AAAAAAAAAA';\"");
    // Normal output works:
     //TCHAR szCmdline[] = TEXT(
     //    "\"C:\\Windows\\System32\\cmd.exe\" /c echo hi");
    PROCESS_INFORMATION piProcInfo;
    STARTUPINFO siStartInfo;
    BOOL bSuccess = FALSE;
    // Set up members of the PROCESS_INFORMATION structure.
    ZeroMemory(&piProcInfo, sizeof(PROCESS_INFORMATION));
    // Set up members of the STARTUPINFO structure.
    // This structure specifies the STDIN and STDOUT handles for redirection.
    ZeroMemory(&siStartInfo, sizeof(STARTUPINFO));
    siStartInfo.cb = sizeof(STARTUPINFO);
    siStartInfo.hStdError = g_hChildStd_OUT_Wr;
    siStartInfo.hStdOutput = g_hChildStd_OUT_Wr;
    siStartInfo.hStdInput = g_hChildStd_IN_Rd;
    siStartInfo.dwFlags |= STARTF_USESTDHANDLES;
    // Create the child process.
    bSuccess = CreateProcess(NULL,
        szCmdline, // command line
        NULL, // process security attributes
        NULL, // primary thread security attributes
        TRUE, // handles are inherited
        0, // creation flags
        NULL, // use parent's environment
        NULL, // use parent's current directory
        &siStartInfo, // STARTUPINFO pointer
        &piProcInfo); // receives PROCESS_INFORMATION
    // If an error occurs, exit the application.
    if (!bSuccess)
        ErrorExit(TEXT("CreateProcess"));
    else
    {
        // Close handles to the child process and its primary thread.
        // Some applications might keep these handles to monitor the status
        // of the child process, for example.
        CloseHandle(piProcInfo.hProcess);
        CloseHandle(piProcInfo.hThread);
        // Close handles to the stdin and stdout pipes no longer needed by the child process.
        // If they are not explicitly closed, there is no way to recognize that the child process
has ended.
        CloseHandle(g_hChildStd_OUT_Wr);
        CloseHandle(g_hChildStd_IN_Rd);
    }
}

void ReadFromPipe(void)
// Read output from the child process's pipe for STDOUT
// and write to the parent process's STDOUT.
// Stop when there is no more data.
{
    DWORD dwRead, dwWritten;
    CHAR chBuf[BUFSIZE];
    BOOL bSuccess = FALSE;
    for (;;)
    {
        bSuccess = ReadFile(g_hChildStd_OUT_Rd, chBuf, BUFSIZE, &dwRead, NULL);
        if (!bSuccess || dwRead == 0)
            break;
        WCHAR outBuf[BUFSIZE];
        dwWritten = MultiByteToWideChar(CP_UTF8, 0, chBuf, dwRead, outBuf, BUFSIZE);
        outBuf[dwWritten] = 0;
        wprintf(L"%s", outBuf);
    }
}

void ErrorExit(PCTSTR lpszFunction)
// Format a readable error message, display a message box,
// and exit from the application.
{
    LPVOID lpMsgBuf;
    LPVOID lpDisplayBuf;
    DWORD dw = GetLastError();
    FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM |
        FORMAT_MESSAGE_IGNORE_INSERTS, NULL, dw,
        MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), (LPTSTR)&lpMsgBuf, 0, NULL);
    lpDisplayBuf = (LPVOID)LocalAlloc(
        LMEM_ZEROINIT,
        (lstrlen((LPCTSTR)lpMsgBuf) + lstrlen((LPCTSTR)lpszFunction) + 40) * sizeof(TCHAR));
    StringCchPrintf((LPTSTR)lpDisplayBuf, 
        LocalSize(lpDisplayBuf) / sizeof(TCHAR),
        TEXT("%s failed with error %d: %s"),
        lpszFunction, dw, lpMsgBuf);
    MessageBox(NULL, (LPCTSTR)lpDisplayBuf, TEXT("Error"), MB_OK);
    LocalFree(lpMsgBuf);
    LocalFree(lpDisplayBuf);
    ExitProcess(1);
}

larrybr added on 2024-05-05 14:11:50:

This anomaly does not represent a SQLite bug.

The provided test program, in its subroutine ReadFromPipe(), purports to translate the UTF-8 stream read from a pipe, (which is fed by a SQLite process), to a WCHAR buffer. This translation is done in batches, sized according to how many chars the OS elects to provide to a ReadFile() call on the pipe. This procedure is prone to slicing off part of the last UTF-8 character at the end, remainder of which is correctly provided at the beginning of the next pipe read. Forcing the interpretation, as UTF-8, of these character fragments yields the "invalid character" results seen.


larrybr added on 2024-05-19 18:29:13:

In further discussion with the person who reported this anomaly, (D.M. of Microsoft), the following facts and suppositions have become evident:

  1. The character-slicing behavior might be attributed to the SQLite CLI, based upon the following observations.

  2. If the affected FILE* output stream had been subject to _setmode(..., _O_U8TEXT), and wide print functions (such as wprintf(...)) are used to write its content, then the C runtime (as provided by Microsoft for some time now) takes care to avoid such slicing in its own sequences of Win32 WriteFile(...) calls. (This, from D.M., is the thrust of said further discussion.)

  3. To exploit this particular slicing avoidance requires that the UTF-8 text emitted by the CLI to non-console streams be translated to UTF-16 so that the wide print functions can be used.

  4. This slicing avoidance has these user-visible benefits: (a) Writes (to files) interrupted by a crash are less likely to have a (sliced) non-UTF-8 character at the end; and (b) content that might be monitored before it is complete (such as in the piping scenario of this ticket) is less likely to have an observable non-UTF-8 character fragment at momentary "ends".

Weighing against modification of the CLI (and other console apps) to attain those benefits, these considerations apply:

  1. The CreatePipe() and WriteFile() APIs provide no way to ensure that UTF-8 character slicing does not occur within their own buffering operation. (In other words, the slicing avoidance is more a reduction.)

  2. Regardless of OS and application-level slicing avoidance, non-volatile storage hardware enforces block writing (rather than character writing), so the benefit is not reliably attained for files.

  3. Some operating systems do not support setmode(..., O_U8TEXT), so this unreliably attained benefit would accrue only on the Windows platform, and this would require even more OS-specific code (than is already present to support console I/O.)

This addendum author (larrybr) is not yet convinced that the benefit of more platform-specific and performance-degrading output stream code is worth the costs. So, at present, this ticket remains closed. If such modification is undertaken, an effort should be made to ensure that the code remains buildable using non-Microsoft toolchains (or more preprocessing complexity is borne to ensure such.)


drh added on 2024-09-16 23:34:11:

I don't think the test program is correct.

The command-line that gets processed (the szCmdline variable) appears to be UTF-8. All those things that look like "A" characters, are really U+ff21 - double-width "A" characters. They are each encoded as three bytes: 0xef 0xbc 0xa1.

The CreateProcess() function to which this variable is the first argument, apparently expects the text to be plain old ASCII. It appears to merely interleave 0x00 bytes to turn that into UTF16. Or something does that. In any event, by the time the arguments make it into the sqlite3.exe command-line shell they are in UTF16 and get translated back into UTF8 by the code on lines 12446 through 12470.

Because of this sequence of translations, each U+ff21 character ends up being three separate characters: U+00ef, U+00bc, and U+00a1. Each byte of the command-line sent into CreateProcess() got processed into a separate UTF16 character before it was passed into sqlite3.exe, still as UTF16. Then sqlite3.exe dutifully translated those UTF16 characters back into UTF8.

The output from sqlite3.exe that comes back through the pipe appears to be correct, given this sequence of translations. Is it gibberish? Yes, it is, exactly the same gibberish that was passed into sqlite3.exe on the command-line. So "sqlite3.exe" appears to be doing the right thing.

You can more clearly see what is happening by changing the command passed into SQLite into this:

SELECT hex('aAz'), 'aAz';

Those "A" characters are really the U+ff21 double-wide A's. In the output from the SQL hex() function, you can clearly see that the input string literal is all messed up. And the string literal does get mangled on output. But if you change the input to the equivalent:

SELECT hex(char(97,65313,122)), char(97,65313,122);

Then you get exactly the output you want through the pipe.

So, as far as I can tell, there is nothing wrong with sqlite3.exe. The problem is in the test program.