Logfile name datetime interpolation

(1) By anonymous on 2025-01-12 10:02:23 [source]

Hello, I've recently started using althttpd with logfile names in this format:

-logfile /logs/althttpd-%Y%m%d.csv

I am launching althttpd as a standalone server with the following command:

sudo ~/althttpsd -port 80 -user www -logfile /logs/althttpd-%Y%m%d.csv -root ~/www -ipshun /ipshun

All appears to be working well.

In my logs directory there are three files (after three days running).

However there are very few entries in date+1 and date+2, and almost all log entries seem to be added to the date+0 logfile, created when the process was first launched.

I can't see any particular pattern with the type of entries that make it into the new logfiles or the original ones.

As far as I can see the file permissions are correct (as evidenced by some entries being written to the new files). The date+1 and date+2 files correctly contain only entries related to these dates. It is only the date+0, original logfile that contains logs from multiple dates, and it also contains the vast majority of the logged data.

I have no doubt that this is a local issue - I'm sure that it would have been noticed before now if not!

I am wondering if this might be the lead althttpd process hanging onto the initial logfile file handle and not updating it correctly, and occasionally daughter processes are newly opening a correct logfile? But this is (probably obviously) not an area of expertise for me.

Any suggestions about solutions or further investigation avenues welcome.

I can obviously filter these dates post hoc fairly easily, but I am unsure why this is not working as I expect it to!

Directory listing of the logs directory:

www@xxx_xxx:~/www$ ls -l logs/
total 116
-rw-r----- 1 www www 103133 Jan 12 09:48 port80-20250110.csv
-rw-r----- 1 www www   1085 Jan 11 16:41 port80-20250111.csv
-rw-r----- 1 www www    107 Jan 12 04:06 port80-20250112.csv

(2) By anonymous on 2025-01-12 19:19:25 in reply to 1 [link] [source]

Ah apologies.

The file names should read eg.

althttpd-20250112.csv

rather than

port80-20250112.csv

Just a copy paste error on my part.

(3) By drh on 2025-01-12 20:23:24 in reply to 1 [link] [source]

Your diagnosis sounds plausible. But I can't know if it is correct without digging into the code.

Here's an idea: The althttpd source code is small (a single file) and accessible. Why not look into the matter and see if you can figure out what is going wrong yourself, then suggest a patch.

The althttpd program has been heavily used for many years (decades). But it has not been widely used. You might be the first person to ever use the variable logfile name syntax together while also running althttpd as a standalone server. The heaviest use of althttpd has been for the https://sqlite.org/ website, and in that application, the logfile name is constant and althttpd is launched from xinetd. So we don't have a lot of experience running althttpd as you are running it. Therefore, please help us to debug it.

(4) By anonymous on 2025-01-12 21:02:16 in reply to 3 [link] [source]

Fair enough.

I'll have a crack at it and see if I can uncover anything.

(5) By anonymous on 2025-01-16 15:27:07 in reply to 3 [link] [source]

So, firstly, full disclosure - I am a medical professional - technically an NHS Consultant Paediatric Anaesthetist, and I have never looked at a bit of C code in my life before.

Having said that, you are not wrong that the source code is small and accessible.

Luckily, I don't seem to have needed to write anything, but instead moved the logfile dynamic naming code from the (?)global scope into the actual logfile writing function (maybe, I'm honestly making this up) where I figured it would be called with every write rather than potentially hanging around as a stale value.

Having looked at the code, I do not understand why this is needed, and is not automatically recalculated for each request that is served. I certainly don't understand why the vast majority, but not all of the logged data ended up in the "stale" filename, with some making it into the new one. But that's why I care for 600g neonates, and you guys code database servers.

So... be very careful accepting what follows. It may have memory leaks. It may make you wince. I had to google how to comment out sections of code in C. I honestly am that much of a C novice. I still haven't got a clue what the deal is with nested comments.

Having said all of that, it was about a five minute cut and paste job, and it seems to work perfectly. I'd have posted earlier but I wanted to check it worked over the last four or five days, and it seems to do exactly what I would expect:

Logging all occurring in the correctly named log file.
Memory usage is not blowing up.
Server responsiveness appears unchanged.
No zombie processes are appearing.
Pigeons are not flying backwards.

I've run a diff on my version vs the site version and included it below.

I'd be delighted if this were fixed in the main repo, but please for $deity$'s sake make it look less amateur.

And if it can't be, never mind, I've got my solution. Thank you for writing a webserver that a complete and total programming beginner can tinker with and reengineer to their own requirements.

Index: althttpd.c
==================================================================
--- althttpd.c
+++ althttpd.c
@@ -949,10 +949,18 @@
       free(zPostData);
     }
     zPostData = 0;
   }
   if( zLogFile && !omitLog ){
+    size_t sz = 0;
+    time_t tNow;
+    time(&tNow);
+    sz = strftime(zExpLogFile, sizeof(zExpLogFile), zLogFile, localtime(&tNow));
+    if( sz==0 || sz>=sizeof(zExpLogFile)-2 ){
+      /* Invalid zExpLogFile name */
+      zExpLogFile[0] = 0;
+    }
     struct rusage self, children;
     int waitStatus;
     const char *zRM = zRemoteUser ? zRemoteUser : "";
     const char *zFilename;
 
@@ -3018,22 +3026,23 @@
 #ifdef LOG_HEADER
   FILE *hdrLog = 0;         /* Log file for complete header content */
 #endif
   char zLine[10000];        /* A buffer for input lines or forming names */
   const MimeTypeDef *pMimeType = 0; /* URI's mimetype */
-  size_t sz = 0;
-  struct tm vTm;            /* Timestamp for zExpLogFile */
+//  size_t sz = 0;
+//  struct tm vTm;            /* Timestamp for zExpLogFile */
 
-  if( zLogFile ){
+/*  if( zLogFile ){
     assert(beginTime.tv_sec > 0);
     gmtime_r(&beginTime.tv_sec, &vTm);
     sz = strftime(zExpLogFile, sizeof(zExpLogFile), zLogFile, &vTm);
   }
   if( sz==0 || sz>=sizeof(zExpLogFile)-2 ){
-    /* Invalid zExpLogFile name */
+    // Invalid zExpLogFile name 
     zExpLogFile[0] = 0;
   }
+*/
   /* Must see a header within 10 seconds for the first request.
   ** Allow up to 5 more minutes for the follow-on requests
   */
   if( useTimeout ){
     if( nRequest>0 ){

(6) By anonymous on 2025-01-21 01:46:38 in reply to 5 [link] [source]

Indeed, I see that I have potentially reversed changes made under point 3 here that were designed to solve a signal safety problem causing hanging processes.

I'm curious if @sodface is reading if that are seeing either the wrong logfile being used when dynamic file names are in play, or if the changes I have proposed cause return of the hanging processes under alpine.

Certainly it works for me™ but I'm wary of leaving a problem for myself in the future.

(7) By sodface on 2025-01-21 04:26:24 in reply to 6 [link] [source]

Hi, I have been reading but I only use a single log file and periodically manually roll it over, completely arbitrarily, usually after I've upgraded the OS and had to restart. To test this, I would need to configure dynamic log filenames, run for awhile to see if I experience the issue and then apply your patch and test again, also checking for hung processes?

(8) By anonymous on 2025-01-21 07:32:05 in reply to 7 [link] [source]

Using a dynamic logfile name with the current code in standalone mode would check that it isn't just my setup that somehow generates the issue (of the majority of log entries being written to the first opened filename). Presumably it would be fairly low effort, and you could just cat all the logfiles together if you needed to analyse them as one big logfile per your current practice.

It would be useful to confirm the problem, and so yes that would be helpful if you are willing.

Applying the patch (apologies again for its crudeness but it seems to work effectively) is a much bigger ask.
It also carries a degree of risk of course, though none has yet been demonstrated.

Certainly I am a little concerned about reactivating that hanging process bug (or even some other race condition) and it would be very handy to have a second confirmation that it is either behaving, or find out early that it is not.

I have not seen any of the problems that you described in your previous thread, but I'm running glibc and not musl.

So again, I would be grateful if you would try the patch, but this is a much bigger ask.

As Richard wrote above, it's not part of the althttpd use case for him, and I don't think Stephan uses a standalone setup either from what I remember him describing previously.

No pressure from me either way though, and thank you for replying.

(9) By sodface on 2025-01-21 12:43:59 in reply to 8 [link] [source]

I changed my log file option to match yours, althttpd-%Y%m%d.csv, and restarted althttpd on my sodface.com server this morning. That server isn't mission critical :-) so I don't mind testing with it, plus it's more likely to expose any issues since it's subjected to real internet traffic.

I'll see how it goes. If I understand the issue correctly, I won't see anything today since there's only the one log file, but as the days go on and new log files get created, I should see new entries either in the original day 0 log file (which is the issue) or entries in the current day log file (as intended) but nothing in any files in between day 0 and current, correct?

(10) By anonymous on 2025-01-21 16:02:05 in reply to 9 [link] [source]

Yes, that's what I've been experiencing. The majority being in the "day_0" file, a small number in "day_current" and nothing new added to any other offset.

Assuming that there is an issue in a few days to a week, you could then consider trying the patch if you were willing. Looking for correct behaviour in terms of fixing the filename issue, and no problems with process hangs or other obvious misbehavior.

It's not a big issue, but it does bug me that it doesn't work correctly, and I'd like to figure out why.

Thanks!

(11.2) By sodface on 2025-01-23 04:32:24 edited from 11.1 in reply to 10 [link] [source]

I am seeing the same issue. I had two log files when I first started looking at it tonight:

althttpd-20250121.csv
althttpd-20250122.csv

I had 551 entries for the 22nd in the log file for the 21st, and only 42 entries in the logfile for the 22nd. I wanted to see if I could figure out what was happening before applying any patches. I think the issue is related to running in standalone mode as Dr. Hipp commented in post #3.

The ProcessOneRequest function sets the expanded logfile name but it uses beginTime on line 3028 which was assigned a value in main() line 3829 when althttpd first started. We are in the child process at this point but beginTime is still based on what the parent set it to in main() and since we are in standalone mode, our parent process stays running and beginTime doesn't get a new value until althttpd is restarted.

The child process does assign beginTime a new value here on line 3065, but that's after the logfile name was already expanded. So rather than apply your patch, I just moved that line up to before the logfile name expansion:

--- ./althttpd.c.orig
+++ ./althttpd.c
@@ -3023,6 +3023,7 @@
   size_t sz = 0;
   struct tm vTm;            /* Timestamp for zExpLogFile */
 
+  gettimeofday(&beginTime, 0);
   if( zLogFile ){
     assert(beginTime.tv_sec > 0);
     gmtime_r(&beginTime.tv_sec, &vTm);
@@ -3062,7 +3063,6 @@
     althttpd_exit(0);
   }
   clock_gettime(ALTHTTPD_CLOCK_ID, &tsBeginTime);
-  gettimeofday(&beginTime, 0);
   omitLog = 0;
   nIn += (i = (int)strlen(zLine));

I'm retesting now but I added the hour to the logfile format althttpd-%Y%m%d-%H.csv to see if I could speed up duplicating the issue (or not if my patch worked). So far it seems to be working correctly but it's only just created the second hour file.

The one thing that I'm struggling to explain, is why it ever logged to the correct file. I can't see why beginTime would have ever been correct after the first day? It seems like the entries in the day_1 log all had corresponding entries at the exact time in the day_0 log, so something with requests being extremely close together?? I can't see it in the code though.

(12) By anonymous on 2025-01-23 06:28:36 in reply to 11.2 [link] [source]

The one thing that I'm struggling to explain, is why it ever logged to the correct file.

I'll be honest, this is also what most intrigued me. I assume that, occasionally, the child chooses to serve more than one request - pipelining perhaps. The correct file entries appear to occur when multiple assets were downloaded (at least for me) and usually there is an identical timestamp (same second) in both logfiles.

Your fix makes sense and also sounds like it is a far safer intervention than my hack - I'm most uncomfortable about shifting all that dynamic code into the makelog function given your previous experiences with signal safety.

I lacked the experience to see it as a solution! Still, I'm now infinitely more skilled as a C hacker, n=1 Vs n=0 so certainly I've benefitted in many ways here.

I shall apply your proposal to my server and check it also works here too, but I certainly expect it to.

Very much obliged, I owe you a pint.

(14) By sodface on 2025-01-23 13:03:23 in reply to 12 [link] [source]

I'll be honest, this is also what most intrigued me. I assume that, occasionally, the child chooses to serve more than one request - pipelining perhaps. The correct file entries appear to occur when multiple assets were downloaded (at least for me) and usually there is an identical timestamp (same second) in both logfiles.

Yes, exactly my observations also (although better described!) and it sort of bugs me that I don't see the reason when staring at the code.

I've benefitted in many ways here.

Me too. Although the althttpd code is well commented and readable, I'm still just an inexperienced hobbyist and never quite followed the forking logic despite looking at it a few times previously. I think I understand it a bit better now.

I owe you a pint.

If only I could take you up on that without boarding a plane!

(17) By anonymous on 2025-01-23 20:30:08 in reply to 14 [link] [source]

If only I could take you up on that without boarding a plane!

I'm in south central England. Let me know if you're ever around!

(13) By Stephan Beal (stephan) on 2025-01-23 08:18:34 in reply to 11.2 [link] [source]

So rather than apply your patch, I just moved that line up to before the logfile name expansion:

Before i apply this patch, can you please try one more thing:

i suspect that the clock_gettime() call should also be moved along with the gettimeofday(), keeping its same relative position.

The clock_gettime() part is used for populating column 12 of the log - the total "wall clock" response time. Moving that up to the start of ProcessOneRequest() also looks to be more correct, in that it will count, e.g., the SSL setup time.

If that change works for you, without new weirdness, i'll apply that.

(15) By sodface on 2025-01-23 13:05:34 in reply to 13 [link] [source]

can you please try one more thing

I applied your change Stephan and am currently running with it, with the per-hour log files. Will report back. Belated Happy New Year!

(21) By Stephan Beal (stephan) on 2025-01-24 13:17:58 in reply to 15 [link] [source]

I applied your change Stephan and am currently running with it, with the per-hour log files.

That change is now in the trunk and also running on my server (which is not in standalone mode, though).

Belated Happy New Year!

You as well!

(22) By anonymous on 2025-01-24 13:49:23 in reply to 21 [link] [source]

That change is now in the trunk

Fantastic, thank you both.

(25) By sodface on 2025-01-24 14:23:40 in reply to 22 [link] [source]

Fun stuff! Did you confirm the fix also?

(27) By anonymous on 2025-01-24 16:20:20 in reply to 25 [link] [source]

I'm running it now.

(28) By anonymous on 2025-01-24 17:36:07 in reply to 25 [link] [source]

So, almost!

I've only been running this for a couple of hours, clearly.

Some of the entries are in the wrong logfile by a couple of seconds (I don't know if I should be surprised by this or not, but it certainly isn't a problem).

One single logfile (I am renaming them minute by minute) has 2 entries from two minutes in the future.

This is under as much load as I can generate on my server, and it may be that the request was actually initiated in the "correct" minute but is being recorded as the time when it finally timed out, assuming that I have actually managed to DOS myself.

Again, this isn't a problem, but I will need to watch it for a while longer before I can be certain that I'm no longer seeing the behaviour.

For all practical purpose though, yes this is working well (and again, far more elegant than my attempt!)

(29) By anonymous on 2025-01-24 17:42:10 in reply to 25 [link] [source]

Ah yes. The offending log entries have "total wall clock time" values of about 157,000,000 so they seem to have taken nearly three minutes to complete.

I should probably stop attacking my own web server now.

(23.1) By sodface on 2025-01-26 04:43:28 edited from 23.0 in reply to 21 [link] [source]

Thanks Stephan! FYI, post #14 shows awaiting moderation?

(24) By sodface on 2025-01-24 14:22:15 in reply to 23.0 [link] [source]

I mean post #17

(26) By Stephan Beal (stephan) on 2025-01-24 14:39:24 in reply to 24 [link] [source]

I mean post #17

Indeed - we missed that one. It's now approved.

(16) By anonymous on 2025-01-23 20:28:17 in reply to 13 [link] [source]

i suspect that the clock_gettime() call should also be moved along with the gettimeofday(), keeping its same relative position.

Presumably you think this might affect everyone?

Not just the dynamic log dates?

I haven't seen any wildly massive or negative entries for that column, associated with the logfile date issue.

(18) By Stephan Beal (stephan) on 2025-01-23 20:35:24 in reply to 16 [link] [source]

Presumably you think this might affect everyone?

Frankly, no, because apparently only people who run althttpd in standalone mode (yourself and sodface) are affected by the logging oddity at all. Those running it via xinetd or similar (namely myself and Richard) apparently aren't seeing this (or it hasn't affected us enough that we'd notice).

Not just the dynamic log dates?

The log date, no, but the info in the logs. The current placement seems off (noting that it was very possibly put in that spot by me) - it should, i currently opine, be near the top of that function. Not that a microsecond here or there really matters, but if we're touching that code then let's touch it!

(19) By anonymous on 2025-01-23 21:06:29 in reply to 18 [link] [source]

Thank you - mostly just checking my own understanding of what you are suggesting!

(20) By sodface on 2025-01-23 22:35:28 in reply to 13 [link] [source]

Looking at my server logs now (I actually was ssh'd into my server all day, oops!) and you can see by the modified times that it worked as desired:

  831 Jan 23 12:59 althttpd-20250123-12.csv
13.8K Jan 23 13:59 althttpd-20250123-13.csv
10.4K Jan 23 14:55 althttpd-20250123-14.csv
11.0K Jan 23 15:58 althttpd-20250123-15.csv
 9.2K Jan 23 16:51 althttpd-20250123-16.csv
 7.5K Jan 23 17:58 althttpd-20250123-17.csv
 6.1K Jan 23 18:56 althttpd-20250123-18.csv
 3.9K Jan 23 19:59 althttpd-20250123-19.csv
 4.6K Jan 23 20:57 althttpd-20250123-20.csv
 5.8K Jan 23 21:58 althttpd-20250123-21.csv
 3.0K Jan 23 22:21 althttpd-20250123-22.csv

I also spot checked the files and they only have entries for the specific hour. The first one is a smaller file size because I started the server just prior to the top of the hour. So, I think it's looking good.

(30) By anonymous on 2025-02-11 08:45:41 in reply to 1 [link] [source]

For what it's worth, I can now confirm again that this has been working as desired for the last month with no issues found.

Thank you!

(31) By drh on 2025-02-11 12:38:42 in reply to 30 [link] [source]

Are you the same "anonymous" that submitted the patch at post 23dff95f40a1fc05? Unfortunately, althttpd is public domain. In order to keep the code in the public domain, we cannot accept patched from people who have not signed a copyright release (https://sqlite.org/althttpd/file/forms/copyright-release.html). If we did incorporate such a patch, the code would belong to the submitter, and althttpd would cease to be in the public domain.

If you would like us to incorporate your patch, please print and sign the form and send a scan to me (drh at sqlite dot org). Also mention on the form the specific posting which is the patch you wish to contribute.

It would also be a good idea to create a non-anonymous account on this forum.

(32) By Stephan Beal (stephan) on 2025-02-11 13:29:15 in reply to 31 [link] [source]

If we did incorporate such a patch, the code would belong to the submitter, and althttpd would cease to be in the public domain.

We're covered there, i believe - i applied a single change based on the discussion in this thread

(33) By spindrift on 2025-02-11 15:32:42 in reply to 31 [link] [source]

Will do to all of the above (and see also the actual and far superior final fix per Stephan's message above).

With relation to the other thread though, I'll send the needful through.