Ticket Hash: | 67d2b0c1fe2291245cbd7ac5feb6295422e4b1c4 | ||
Title: | Regression in the daemon caused by curl? | ||
Status: | Open | Type: | Code_Defect |
Severity: | Minor | Priority: | Immediate |
Subsystem: | Resolution: | Open | |
Last Modified: |
2022-03-28 18:17:07 3.20 years ago |
Created: |
2021-12-14 17:17:41 3.49 years ago |
Version Found In: |
User Comments: | ||||
nick.tessier added on 2021-12-14 17:17:41:
Hi, feel free to treat this as low priority. The other changes we've requested from you can be put first. So I haven't actually tested this regression in anything but windows, but I do think this regression only exists in windows, since the majority of the changes involved are surrounding WINSOCK. I've narrowed this down to a specific PR added in version 7.77 of curl. PR here. This PR was closed and not merged but I believe it was rebased in to master elsewhere. Also if its any easier, I'll attach two copies of the same file multi.c These are the two copies I was working with while attempting to figure this out. One version works and one does not. As for the rest of curl, I had downloaded 7.77 curl and swapped out the two multi.c files (One multi.c file is from a version before 7.77). As for what actually goes wrong... Unfortunately in general the more I try to observe the problem the less it happens. So if I slow things down and step one by one it seems to function fine, similarly if I slow down the code by adding more logging or using a non optimized build the test also has a higher chance of succeeding. I can rerun the very same test back to back (test attaches to a container, copies an existing db and makes changes to the copy) and sometimes get different errors each time such as SQLITE_CANTOPEN or SQLITE_IOERR. It seems like the sockets used for communication between the client and the daemon have trouble functioning properly as a result of this change. It seems like most often I am hitting a failed receive in bcv_recv. First the daemon fails to receive and then the cilent (as pictured in a callstack I'll attach) fails to receive a response from the message they had sent. Any idea what could be going wrong here? nick.tessier added on 2022-03-28 13:04:19: Hi, Could we bump the priority of this ticket? dan added on 2022-03-28 15:59:29: Hi, Can you try with this version: https://sqlite.org/blockcachevfs/info/59506d815f342462 and "event" logging turned on in the daemon (i.e. "-log e") and send us the log? The change just logs out the winsock error after the failed recv() in the daemon process. Maybe it will be WSAEINTR or something: https://docs.microsoft.com/en-us/windows/win32/winsock/windows-sockets-error-codes-2 Thanks, Dan. nick.tessier added on 2022-03-28 16:19:59: INFO(e) [2022/03/28 16:17:23.880]: listening on localhost port 2030 INFO(e) [2022/03/28 16:17:23.892]: install manifest for container imodelblocks-3cfe9b0d-c19e-48dd-a67d-8fb78c971d4b. versions=(BASELINE.bim=1, 912c1c83ef5529214b66a4bd6fca9c5e28d250ac.bim=1) INFO(e) [2022/03/28 16:17:23.909]: POLL MANIFEST operation on container imodelblocks-3cfe9b0d-c19e-48dd-a67d-8fb78c971d4b starting INFO(e) [2022/03/28 16:17:23.954]: install manifest for container imodelblocks-3cfe9b0d-c19e-48dd-a67d-8fb78c971d4b. versions=(BASELINE.bim=1, 912c1c83ef5529214b66a4bd6fca9c5e28d250ac.bim=1) INFO(e) [2022/03/28 16:17:23.961]: POLL MANIFEST operation on container imodelblocks-3cfe9b0d-c19e-48dd-a67d-8fb78c971d4b finished - ok INFO(e) [2022/03/28 16:17:26.375]: POLL MANIFEST operation on container imodelblocks-3cfe9b0d-c19e-48dd-a67d-8fb78c971d4b starting INFO(e) [2022/03/28 16:17:26.378]: install manifest for container imodelblocks-3cfe9b0d-c19e-48dd-a67d-8fb78c971d4b. versions=(BASELINE.bim=1, 912c1c83ef5529214b66a4bd6fca9c5e28d250ac.bim=1) INFO(e) [2022/03/28 16:17:26.384]: POLL MANIFEST operation on container imodelblocks-3cfe9b0d-c19e-48dd-a67d-8fb78c971d4b finished - ok INFO(e) [2022/03/28 16:17:26.385]: WSAGetLastError() returns 0, disconnecting INFO(e) [2022/03/28 16:17:26.386]: WSAGetLastError() returns 10035, disconnecting Error Error: error opening iModel, unable to open database file dan added on 2022-03-28 17:37:54:
WSAWOULDBLOCK. I actually thought our local sockets were blocking, but the docs say that functions like WSAEventSelect "automatically sets socket s to nonblocking mode". We pass these sockets into Curl for event management, so I guess it's using one of those functions. Can you try with the following change? https://sqlite.org/blockcachevfs/info/8954abdf10251071 This retries the recv() call if it fails with WSAWOULDBLOCK. And assert()s that WSAPoll agrees there is data to read before we try to receive a message on a socket. Thanks, Dan. nick.tessier added on 2022-03-28 18:17:07: This appears to have fixed the problem. Thank you! |
Attachments:
- brokenmultihandle.h [download] added by nick.tessier on 2021-12-14 17:19:59. [details]
- brokenmulti.c [download] added by nick.tessier on 2021-12-14 17:19:47. [details]
- multihandle.h [download] added by nick.tessier on 2021-12-14 17:18:58. [details]
- multi.c [download] added by nick.tessier on 2021-12-14 17:18:40. [details]
- ExampleCallStackCurl.PNG [download] added by nick.tessier on 2021-12-14 17:17:51. [details]