I’ll describe here how to use logstash and logstash-forwarder to harvest the apache access logs on a web server so you can centralize it in elasticsearch and kibana.
It is recommended to check my previous post for the software setup : Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu
Apache log format
Logstash is able to parse JSON by using a specific input codec, we’ll define a new logging format in JSON for the apache access logs so we don’t have to define grok patterns in the indexer.
Create the following file on your webserver /etc/apache2/conf.d/apache-json-logging and add the following content:
LogFormat "{ \ \"@timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \ \"@version\": \"1\", \ \"tags\":[\"apache\"], \ \"message\": \"%h %l %u %t \\\"%r\\\" %>s %b\", \ \"clientip\": \"%a\", \ \"duration\": %D, \ \"status\": %>s, \ \"request\": \"%U%q\", \ \"urlpath\": \"%U\", \ \"urlquery\": \"%q\", \ \"bytes\": %B, \ \"method\": \"%m\", \ \"site\": \"%{Host}i\", \ \"referer\": \"%{Referer}i\", \ \"useragent\": \"%{User-agent}i\" \ }" apache_access_json
See http://httpd.apache.org/docs/2.2/mod/mod_log_config.html for more information on log format variables.
Now you need to told one or more of your defined VHost to use this custom log format. Edit one of your available sites and add the following line:
CustomLog ${APACHE_LOG_DIR}/my_site.com_access.log.json apache_access_json
Restart the web server:
$ sudo service apache2 restart
Setup logstash-forwarder
You can follow the steps described in my previous post to install logstash-forwarder on your web server.
Once installed, you’ll need to create the configuration file /etc/logstash-forwarder with the following content:
{ "network": { "servers": [ "logstash-node:5000" ], "ssl ca": "/etc/ssl/logstash.pub", "timeout": 15 }, "files": [ { "paths": [ "/var/log/apache2/my_site.com_access.log.json" ], "fields": { "type": "apache" } } ] }
Restart the service to reload the configuration:
$ sudo service logstash-forwarder restart
Process the logs in the logstash indexer
Add the following input definition to the configuration file:
/etc/logstash/conf.d/00_input.conf
lumberjack { port => 5000 tags => ["apache_access_json"] codec => "json" ssl_certificate => "/etc/ssl/logstash.pub" ssl_key => "/etc/ssl/logstash.key" }
This configuration will told logstash to listen and receive events via the lumberjack protocol. It will listen on port 5000, use the json codec to process events and automatically add a tag apache_access_json to each.
See http://logstash.net/docs/1.4.1/inputs/lumberjack for more information on this input.
/etc/logstash/conf.d/02_filter_apache.conf
filter { if "apache_access_json" in [tags] { if [useragent] != "-" and [useragent] != "" { useragent { add_tag => [ "UA" ] source => "useragent" prefix => "UA-" } } mutate { convert => ['duration', 'float'] } ruby { code => "event['duration']/=1000000" } if [bytes] == 0 { mutate { remove_field => "[bytes]" } } if [urlquery] == "" { mutate { remove_field => "urlquery" } } if [method] =~ "(HEAD|OPTIONS)" { mutate { remove_field => "method" } } if [useragent] == "-" { mutate { remove_field => "useragent" } } if [referer] == "-" { mutate { remove_field => "referer" } } if "UA" in [tags] { if [device] == "Other" { mutate { remove_field => "device" } } if [name] == "Other" { mutate { remove_field => "name" } } if [os] == "Other" { mutate { remove_field => "os" } } } } }
This configuration file will apply some filters on events tagged as apache_access_json.
You’ll notice the fields such as bytes, useragent, duration… The fields are automatically setted by logstash during the event reception using the json codec.
Now, you need to restart logstash to apply the changes:
$ sudo service logstash restart