Logstash recipe – Apache access log

I’ll describe here how to use logstash and logstash-forwarder to harvest the apache access logs on a web server so you can centralize it in elasticsearch and kibana.

It is recommended to check my previous post for the software setup : Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu

Apache log format

Logstash is able to parse JSON by using a specific input codec, we’ll define a new logging format in JSON for the apache access logs so we don’t have to define grok patterns in the indexer.

Create the following file on your webserver /etc/apache2/conf.d/apache-json-logging and add the following content:

LogFormat "{ \
            \"@timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \
            \"@version\": \"1\", \
            \"tags\":[\"apache\"], \
            \"message\": \"%h %l %u %t \\\"%r\\\" %>s %b\", \
            \"clientip\": \"%a\", \
            \"duration\": %D, \
            \"status\": %>s, \
            \"request\": \"%U%q\", \
            \"urlpath\": \"%U\", \
            \"urlquery\": \"%q\", \
            \"bytes\": %B, \
            \"method\": \"%m\", \
            \"site\": \"%{Host}i\", \
            \"referer\": \"%{Referer}i\", \
            \"useragent\": \"%{User-agent}i\" \
           }" apache_access_json

See http://httpd.apache.org/docs/2.2/mod/mod_log_config.html for more information on log format variables.

Now you need to told one or more of your defined VHost to use this custom log format. Edit one of your available sites and add the following line:

CustomLog       ${APACHE_LOG_DIR}/my_site.com_access.log.json apache_access_json

Restart the web server:

$ sudo service apache2 restart

Setup logstash-forwarder

You can follow the steps described in my previous post to install logstash-forwarder on your web server.

Once installed, you’ll need to create the configuration file /etc/logstash-forwarder with the following content:

{
    "network": {
        "servers": [
            "logstash-node:5000"
        ],
        "ssl ca": "/etc/ssl/logstash.pub",
        "timeout": 15
    },
    "files": [
        {
            "paths": [
                "/var/log/apache2/my_site.com_access.log.json"
            ],
            "fields": {
                "type": "apache"
            }
        }
    ]
}

Restart the service to reload the configuration:

$ sudo service logstash-forwarder restart

Process the logs in the logstash indexer

Add the following input definition to the configuration file:

/etc/logstash/conf.d/00_input.conf

lumberjack {
           port => 5000
           tags => ["apache_access_json"]
           codec => "json"
           ssl_certificate => "/etc/ssl/logstash.pub"
           ssl_key => "/etc/ssl/logstash.key"
}

This configuration will told logstash to listen and receive events via the lumberjack protocol. It will listen on port 5000, use the json codec to process events and automatically add a tag apache_access_json to each.

See http://logstash.net/docs/1.4.1/inputs/lumberjack for more information on this input.

/etc/logstash/conf.d/02_filter_apache.conf

filter {
       if "apache_access_json" in [tags] {

          if [useragent] != "-" and [useragent] != "" {
             useragent {
                       add_tag => [ "UA" ]
                       source => "useragent"
                       prefix => "UA-"
             }
          }

          mutate {
                 convert => ['duration', 'float']
          }

          ruby {
               code => "event['duration']/=1000000"
          }

          if [bytes] == 0 { mutate { remove_field => "[bytes]" } }
          if [urlquery]              == "" { mutate { remove_field => "urlquery" } }
          if [method]    =~ "(HEAD|OPTIONS)" { mutate { remove_field => "method" } }
          if [useragent] == "-"              { mutate { remove_field => "useragent" } }
          if [referer]   == "-"              { mutate { remove_field => "referer" } }

          if "UA" in [tags] {
             if [device] == "Other" { mutate { remove_field => "device" } }
             if [name]   == "Other" { mutate { remove_field => "name" } }
             if [os]     == "Other" { mutate { remove_field => "os" } }
          }
       }
}

This configuration file will apply some filters on events tagged as apache_access_json.

You’ll notice the fields such as bytes, useragent, duration… The fields are automatically setted by logstash during the event reception using the json codec.

Now, you need to restart logstash to apply the changes:

$ sudo service logstash restart
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s