Experimental anti-robot option: --ipshun

(1) By drh on 2022-03-24 18:22:44 [source]

On the ipshun branch I have checked in changes for a new command-line option that enables more active defenses against attack robots.

If the "--ipshun DIRECTORY" option is included to althttpd and DIRECTORY is a full pathname (begins with "/") accessible from within the chroot jail, and if the IP address of the client appears as a file within that directory, then althttpd might return 503 Service Unavailable rather than process the request.

If the file is zero bytes in size, then 503 is always returned. Thus you can "touch" a file that is an IP address name to permanently banish that client.
If the file is N bytes in size, then 503 is returned if the mtime of the file is less than 60*N seconds ago. In other words, the client is banished for one minute per byte in the file.

Banishment files are automatically created if althttpd gets a request that would have resulted in a 404 Not Found, and upon examining the REQUEST_URI the request looks suspicious. Any request that include /../ is considered a hack attempt, for example. There are other common vulnerability probes that are also checked. Probably this list of vulnerability probes will grow with experience.

The banishment files are automatically unlinked after 5 minutes/byte.

Banishment files are initially 1 bytes in size. But if a banishment expires and then a new attack is detected prior to 5 min/byte cleanup time, then the file grows by one byte and the mtime is reset.

Motivation

There was a nasty robot yesterday that was filling out forms and causing problems. It appeared to be the same robot, though two separate IP addresses, one from Cyprus and the other from Hungary. I've long since null-routed both of those IP addresses. This enhancement, had it been deployed yesterday, would have likely blocked that robot prior to it causing problems.

Using The Filesystem As A Database

Why not keep the list of shunned IP addresses in (say) an SQLite database? Because while SQLite is very fast (even faster than stat()) once it gets initialized, there is a lot of initialization to the database to get it up and running. For example, it has to read and parse the schema every time it is opened. I don't want to introduce that much overhead for every incoming HTTP request. As currently designed, we do a single stat() system call to determine whether or not the IP address should be shunned.

(2) By Stephan Beal (stephan) on 2022-03-24 18:39:30 in reply to 1 [link] [source]

If the file is zero bytes in size, then 503 is always returned. Thus you can "touch" a file that is an IP address name to permanently banish that client.

Might it make sense to move the current list of hard-coded blocked hosts to such files, using hostnames instead of IPs? Presumably those hosts aren't really relevant for most hosters (in that they're not being attacked by them)?

(3) By sodface on 2022-03-24 22:21:21 in reply to 1 [link] [source]

I've had an issue for quite a while that I should have addressed by now but I haven't taken the time out to figure out the best solution and I'm wondering if this new feature would apply?

I think the issue I have is that some Cloudflare customer is pointing traffic for their domain to my site's IP address. It might even have been a legitimate configuration at one point and then they got rid of the server, I got the recycled IP from the shared host, and they never bothered to update the config. Or something.

It was worse at first because I had a default.website entry setup so actual page content was being returned under the "wrong" domain. I disabled that so now at least a 404 is returned but that's not ideal because 404 is mostly interpreted as "you came to the right place but the thing you are looking for isn't here" and I'd rather return "you're at the wrong place" or even better, nothing at all and just drop the request.

Since Oct of 21, I have 27435 page requests for the domain I do not host (or own), from 897 IP addresses, which, when I spot check them, all belong to Cloudflare.

A typical log entry is:

2022-03-24 04:45:06,162.158.146.207,"http://www.pokelifehacks.com/","",404,505,262,565,0,0,0,438,1,"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36","",29,400

Maybe this would be better handled at the firewall by summarizing the source IPs and DROPing the packets.

(4) By sean (naes_guy) on 2022-03-25 15:41:24 in reply to 1 [link] [source]

If the file is zero bytes in size, then 503 is always returned. Thus you can "touch" a file that is an IP address name to permanently banish that client.

I haven't tested...do you think this would be a problem with ipv6 addresses?

(5) By Stephan Beal (stephan) on 2022-03-25 16:00:10 in reply to 4 [link] [source]

do you think this would be a problem with ipv6 addresses?

It works fine for ipv6 - saw it happen yesterday.