I dislike administering systems. If all I ever had to do were to type
apt-get update and have all of my system administration done for me, that would be fine. Unfortunately, I have to administer systems now and then.
Fortunately, the free software world has a lot of people in the same
situation, and a lot of smart people have written useful software to manage
their systems. As a case in point, consider fail2ban, which I'd have had to invent if it
didn't already exist.
fail2ban watches log files for suspicious
patterns and sends traffic from the offending IP addresses to a blackhole. For
example, if some malicious remote machine in a botnet comes knocking at your
SSH server with a dictionary full of usernames,
fail2ban will let
the kernel silently drop all network traffic from that machine for an hour
after the third failed login.
That's all configurable. In fact, you can configure all of the existing rules and add new rules yourself.
I did that the other day on a client's server. Somehow, the Internet at large had decided that a web-based system administration service called phpMyAdmin was running on the server. That meant thousands of attempts to find dozens of versions of phpMyAdmin. (I assure you—there is no PHP running on that machine. phpMyAdmin has security holes? Who would have guessed?) That meant a lot of wasted resources and a lot of useless entries in the log files. (We hadn't yet made it around to monitoring log files for reporting yet, so it was worse than it should have been.)
"Self," I told myself. "You should add a
fail2ban rule to
detect phpMyAdmin scans and drop that traffic."
I did. It was more difficult than it should have been.
fail2ban uses regular expressions to find individual entries in
log files which represent suspicious access patterns. One line in a log file
represents one event. This is the Unix way. This has been the Unix way for 40
years. It's been the Unix way for 40 years for one reason: it works pretty
well, for the most part. (I like Unix, but I see its flaws
The web application I intended to secure has an administrative interface
/admin. This makes sense. One of the places you can
install phpMyAdmin is also to
/admin. This also makes a certain
amount of sense.
The routing system in the client's web application redirects all requests
under the Admin controller (the code counterpart to
/admin) to a
catchall action so as not to expose internal details of what is and isn't
available with or without specific authentication credentials. This makes sense
when I think about it one way and doesn't necessarily make sense another way.
(It's not entirely what someone might call RESTful and it's almost certainly a
violation of the HATEOAS concordat. Then again, it's an administrative
interface hidden from the Internet at large behind authentication
The first version of my regular expression looked for all attempts to access
/PhpMyAdmin, et all
which resulted in a redirection.
/admin also redirects real users with real web
/admin/login to give them a chance to use a login
mechanism that's not nearly as hateful as the basic authentication dialog
that's been largely unchanged in web browsers since 1994. (You remember 1994.
That's before PHP existed and before Windows machines were on the Internet in
such droves that it made sense to gather a huge botnet of poorly secured
Windows machines to search for phpMyAdmin vulnerabilities. Also you could have
bought AAPL at a deep discount compared to now.)
Unfortunately, my first regular expression matched users going to
/admin and getting redirected to
/admin/login just as
well as it matched bots going to
/phpMyAdmin and getting
redirected to an error page.
I changed the regular expression. We could also have made
/admin display a login form to an unauthorized user. We could have
done a lot of things. I changed the regular expression.
The next day, I realized the problem was that the standard Unix mechanism of
logging plain text in a well-understood format and parsing it with regular
expressions (or even a grammar) threw away information and tried to reconstruct
it badly. At the point in the web application where the router received a
remote request and redirected it, the router knows exactly why it is
redirecting the request. It knows that
/phpMyAdmin is an
invalid route. It knows than an authenticated user requesting
/admin should get redirected to the administrative dashboard. It
knows that an unauthenticated user requesting
/admin should get
Unfortunately, none of that reasoning gets into the Apache httpd-style log
file. It gets a datestamp, an IP address, the URL request path, and an HTTP
status response code. From there,
fail2ban and the regular
expression guess at why that log entry is there.
fail2ban is a good Unix program and is flexible
about which log file it scans. I could add another log file to the web
application to write entries only when something makes a request for a path
that's completely unknown; if there's no controller mapped to the request path
/phpmyadmin, write to the log. That's only slightly more
difficult to create and to configure than it is to explain. You probably
already know how to do it already.
Unfortunately, writing a separate log file only works around the problem. I
still have to write a regular expression to parse lines in that log file so
fail2ban will handle them appropriately. That's the Unix
philosophy at work. It works pretty well and it's worked pretty well for
decades. Sure, there are ambiguities, but you can work around them pretty well
Sometimes, though, I tell myself what I think I want is the ability to send structured data as events to a centralized event listener system to which other processes can connect as listeners. I know there are things like systemd and D-Bus in the freedesktop.org specification, but I rewrote the regex because pretty well gets the job done now and I don't expect this system to last 40 years.
(In fact, that sums up Unix pretty well too.)