Jan 152018
 

As I got Cent OS 7.4 running, a bit strange thing happened. When I ran usual ll (alias to ls -lA), I got a slightly unexpected result:

# ll
drwxrwx--- 4 apache apache  4096 Dec 24 06:50 download
-rw-rw---- 1 apache apache  5430 Dec 23 08:06 favicon.ico
-rw-rw---- 1 apache apache 12300 Dec 26 02:25 .htaccess
-rw-rw---- 1 apache apache   460 Dec 23 08:06 index.php
-rw-rw---- 1 apache apache   117 Dec 23 20:39 robots.txt
drwxrwx--- 2 apache apache  4096 Dec 26 01:44 .well-known
drwxrwx--- 5 apache apache  4096 Dec 23 17:32 wordpress

Can you spot the issue?

Yep, Cent OS got a bit (too) smart so sorting ignores the starting dot and gets those files too in the alphabetic order. Those used to dot files on the top – though luck.

Well, it’s possible to “correct” this behavior using the slightly different alias in .bashrc:

alias ll='LC_COLLATE=C ls -lA'

This gives a (properly) sorted output:

# ll
-rw-rw---- 1 apache apache 12300 Dec 26 02:25 .htaccess
drwxrwx--- 2 apache apache  4096 Dec 26 01:44 .well-known
drwxrwx--- 4 apache apache  4096 Dec 24 06:50 download
-rw-rw---- 1 apache apache  5430 Dec 23 08:06 favicon.ico
-rw-rw---- 1 apache apache   460 Dec 23 08:06 index.php
-rw-rw---- 1 apache apache   117 Dec 23 20:39 robots.txt
drwxrwx--- 5 apache apache  4096 Dec 23 17:32 wordpress

Jan 102018
 

As I planned move of my site to Linode, first I needed a place to test. It was easy enough to create test domain and fill it with migrated data but I didn’t want Google (or any other bot) to index it. The easiest way to do so was to require authentication. In Apache configuration that can be done using Directory directive:

<Directory "/var/www/html">
    AuthType Basic
    AuthUserFile "/var/www/.htpasswd"
    Require valid-user
</Directory>

However, this also means that my robots.txt with disallow statements was also forbidden. What I really wanted was to allow only access to robots.txt while forbidding everything else.

A bit of modification later, this is what I came up with:

<Directory "/var/www/html">
    AuthType Basic
    AuthUserFile "/var/www/.htpasswd"
    Require valid-user
    <Files "robots.txt">
        Allow from all
        Satisfy Any
    </Files>
</Directory>

Jan 062018
 

It has been a very scary start of the year. We’re only a few days in and world is already falling apart. If you aren’t scared already, it is enough to see a demonstration for Meltdown and Spectre exploits to feel very uncomfortable.

I won’t go into the details as this dreadful exploit family already has a web page with all the information one could desire to know. If that’s not enough, probably every major news outlet has an article or two about it.

In the midst of all this ruckus and panic unfortunately, for most of us, there is nothing to do. Due to the nature of these faults, fix has to be either done in hardware (albeit with some mitigations via microcode update) or in OS kernel of your choice. There is simply nothing application developer can realistically do but wait. Once “big boys” have done their work, there will be a flurry of activity if you need to do some performance testing and that’s it. Explicit regression testing will not be needed as you have it automated to run over night anyhow (wink-wink) and the risk of user code breakage is quite low.

If you are dealing with OS maintenance, you will have a bit more work to do. While some patches are already out, more are still expected, and I trust Murphy will ensure that at least some patches will receive patches of their own. If you are dealing with a cloud environment you will have your work multiplied by a factor but that comes with a saving grace of easily automating stuff across many machines. It will be busy but surmountable.

Those of us who also deal with hardware, I pity. Updating firmware is annoying even when there is no pressure. Generally machine has to go down to even think about it. Then you will try to automate it only to find out that 50% of your blades simply didn’t “take” the update and vendor coolly advises that “it sometime happens” and that you should proceed with manual installation.

And, of course, these servers haven’t had their firmware updated for a while and microcode you want to get will come with bunch of other firmware fixes and changes you don’t want to deal with right now. Tough luck – microcode will not be “backported” to your current version. Just hope it doesn’t change some obscure default causing issue when machine is finally booted up or that you will need to update your pristine 1.0 to some other version before you can even think about getting the latest.

And please don’t think about going home because you’ll see BIOS with microcode update ready in the next few days for your home computer too. For example, my Dell has it for a couple of days now. So you will go updating all personal computers only to discover your wife’s laptop doesn’t boot anymore…

May you live in interesting times, indeed.

Jan 052018
 

As I got my web server running, it came to me to track Apache logs for potential issues. My idea was to have a base script that would, on a single screen, show both access and error logs in green/yellow/red pattern depending on HTTP status and error severity. And I didn’t want to see the whole log – I wanted to keep information at minimum – just enough to determine if things are going good or bad. If I see something suspicious, I can always check full logs.

Error log is easy enough but parsing access log in the common log format (aka NCSA) is annoyingly difficult due to its “interesting” choice of delimiters.

Just looks at this example line:

108.162.245.230 - - [26/Dec/2017:01:16:45 +0000] "GET /download/bimil231.exe HTTP/1.1" 200 1024176 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"

First three entries are space separated – easy enough. Then comes date in probably the craziest format one could fine and enclosed in square brackets. Then we have request line in quotes, followed by a bit more space-separated values. And we finish with a few quoted values again. Command-line parsing was definitely not in mind of whoever “designed” this.

With Apache you can of course customize format for logging – but guess what? While you can make something that works better with command-line tools, you will lose a plethora of tools that already work with NCSA format – most notably Webalizer. It might be a bad choice for command line, but it’s the standard regardless.

And extreme flexibility of Linux tools also means you can do trickery to parse fields even when you deal with something as mangled as NCSA.

After a bit of trial and error, my final product was the script looking a bit like this:

#!/bin/bash

LOG_DIRECTORY="/var/www"

trap 'kill $(jobs -p)' EXIT

tail -Fn0 $LOG_DIRECTORY/apache_access.log | gawk '
  BEGIN { FPAT="([^ ]+)|(\"[^\"]+\")|(\\[[^\\]]+\\])" }
  {
    code=$6
    request=$5

    ansi="0"
    if (code==200 || code==206 || code==303 || code==304) {
      ansi="32;1"
    } else if (code==301 || code==302 || code==307) {
      ansi="33;1"
    } else if (code==400 || code==401 || code==403 || code==404) {
      ansi="31;1"
    }
    printf "%c[%sm%s%c[0m\n", 27, ansi, code " " request, 27
  }
' &

tail -Fn0 $LOG_DIRECTORY/apache_error.log | gawk '
  BEGIN { FPAT="([^ ]+)|(\"[^\"]+\")|(\\[[^\\]]+\\])" }
  {
    level=$2
    text=$5 " " $6 " " $7 " " $8 " " $9 " " $10 " " $11 " " $12 " " $13 " " $14 " " $15 " " $16

    ansi="0"
    if (level~/info/) {
      ansi="32"
    } else if (level~/warn/ || level~/notice/) {
      ansi="33"
    } else if (level~/emerg/ || level~/alert/ || level~/crit/ || level~/error/) {
      ansi="31"
    }
    printf "%c[%sm%s%c[0m\n", 27, ansi, level " " text, 27
  }
' &

wait

Script tails both error and access logs, waiting for Ctrl+C. Upon exit, it will kill spawned jobs via trap.

For access log, gawk script will check status code and color entries accordingly. Green color is for 200 OK, 206 Partial Content, 303 See Other, and 304 Not Modified; yellow for 301 Moved Permanently, 302 Found, and 307 Temporary Redirect; red for 400 Bad Request, 401 Unauthorized, 403 Forbidden, and 404 Not Found. All other codes will remain default/gray. Only code and first request line will be printed.

For error log, gawk script will check only error level. Green color will be used for Info; yellow color is for Warn and Notice; red is for Emerg, Alert, Crit, and Error. All other (essentially debug and trace) will remain default/gray. Printout will consist just of error level and first 12 words.

This script will not only shorten quite long error and access log lines to their most essential parts, but coloring will enable one to see the most important issues at a glance – even when lines are flying around. Additionally, having them interleaved lends itself nicely to a single screen monitoring station.