High-availability with HAProxy and keepalived on Ubuntu 12.04

‘Lo there !

Here is a little post on how you can easily setup a highly available HAProxy service on Ubuntu 12.04 ! I tend to use more and more HAProxy these times, adding more backends and connections on it. Then, I thought, what if it goes down? How can I ensure high availability on that service?

Here enters keepalived, which allows to setup another HAProxy node to create a active/passive cluster. If the main HAProxy node goes down, the second one will take the relay.

In the following examples, I assume the following:

  • Master node address: 10.10.1.1
  • Slave node address: 10.10.1.2
  • Highy available HAProxy virtual address: 10.10.1.3

Install HAProxy

You’ll need to install it on both nodes:

$ sudo apt-get install haproxy

Now, edit the file /etc/default/haproxy and set the property ENABLED to 1.

Start the service, and you’re done 🙂

$ sudo service haproxy start

Install keepalived

Prerequisite

You’ll need to update your sysctl configuration to allow non-local addresses binding:

$ echo "net.ipv4.ip_nonlocal_bind = 1" | sudo tee -a /etc/sysctl.conf
$ sudo sysctl -p

Setup

Install the package:

$ sudo apt-get install keepalived

Create a configuration /etc/keepalived/keepalived.conf file for the master node:

/etc/keepalived/keepalived.conf

global_defs {
 # Keepalived process identifier
 lvs_id haproxy_KA
}

# Script used to check if HAProxy is running
vrrp_script check_haproxy {
 script "killall -0 haproxy"
 interval 2
 weight 2
}

# Virtual interface
vrrp_instance VIP_01 {
 state MASTER
 interface eth0
 virtual_router_id 7
 priority 101

 virtual_ipaddress {
 10.10.1.3
 }

 track_script {
 check_haproxy
 }
}

Do the same for the slave node, with a few changes:

/etc/keepalived/keepalived.conf

global_defs {
 # Keepalived process identifier
 lvs_id haproxy_KA_passive
}

# Script used to check if HAProxy is running
vrrp_script check_haproxy {
 script "killall -0 haproxy"
 interval 2
 weight 2
}

# Virtual interface
vrrp_instance VIP_01 {
 state SLAVE
 interface eth0
 virtual_router_id 7
 priority 100

 virtual_ipaddress {
 10.10.1.3
 }

 track_script {
 check_haproxy
 }
}

WARNING: Be sure to assign a unique virtual_router_id for that keepalived configuration on the subnet 10.10.1.0.

Last step, start the service keepalived service on the master node first and then on the slave.

$ sudo service keepalived start

You can check that the virtual IP address is created with the following command on the master node:

$ ip a | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
 inet 10.10.1.1/25 brd 10.10.1.127 scope global eth0
 inet 10.10.1.3/32 scope global eth0

If you stop the HAProxy service on the master node or shutdown the node, the virtual IP will be transfered on the passive node, you can use the last command to verify that the VIP has been transfered.

 

Puppet & Foreman : The scalable way

Oye there !

We’ve been struggling with a single Puppetmaster for around 250+ nodes in my company for the last months, and I decided to review the actual architecture toward something more evolved. Managing the nodes in site.pp didn’t seem natural for me and the file began to be unmaintanable.

So I’ve googled around for some alternatives and found something called ENC, standing for External Node Classifier, and more specifically Foreman.

I’ve also decided to add high availability and load balancing in my architecture, and I’ve come up with the following:

Puppet & Foreman architecture

I’ll describe in this post how to setup the same stack on Ubuntu Server 12.04 nodes using Foreman 1.5.1 and Puppet 3.6.2.

Requirements

Nodes

As you can see in the scheme above, this setup is composed of 10 nodes.

You’ll need to setup the following nodes (I’ve specified IPs for example) :

  • foreman.domain – 10.0.0.1
  • foreman-enc.domain – 10.0.0.2
  • foreman-reports.domain – 10.0.0.3
  • foreman-db.domain – 10.0.0.4
  • memcached.domain – 10.0.0.5
  • puppetmaster-1.domain – 10.0.0.6
  • puppetmaster-2.domain – 10.0.0.7
  • puppet-ca.domain – 10.0.0.8
  • puppet-lb.domain – 10.0.0.9
  • puppet-lb-passive.domain – 10.0.0.10

A virtual IP will be required for the high available load balancer : 10.0.0.11.

Puppet

Ensure you have Puppet installed on each node:

$ wget https://apt.puppetlabs.com/puppetlabs-release-precise.deb
$ sudo dpkg -i puppetlabs-release-precise.deb
$ sudo apt-get update && sudo apt-get install puppet

SSL (optional)

In my company, we got our own SSL certificates for our internal services.

If you want to setup this kind of architecture with your certificates, ensure you have the following files on all the nodes:

  • /etc/ssl/domain/certs/ca.pem
  • /etc/ssl/domain/certs/domain.pem
  • /etc/ssl/domain/private_keys/domain.pem

Of course you can skip this step, you’ll also need to skip the configurations steps for the SSL setups in the next parts.

Setup the Foreman block

Setup for the foreman block introduction.

MySQL server

The installation of a MySQL server is beyond the scope of this topic, ensure you have a MySQL instance running.

Then create a database for foreman and the required users:

CREATE DATABASE foreman CHARACTER SET utf8;
CREATE USER 'foreman'@'foreman.domain';
GRANT ALL PRIVILEGES ON foreman.* TO 'foreman'@'foreman.domain' IDENTIFIED BY 'foreman_password';
CREATE USER 'foreman'@'foreman-enc.wit.prod';
GRANT ALL PRIVILEGES ON foreman.* TO 'foreman'@'foreman-enc.domain' IDENTIFIED BY 'foreman_password';
CREATE USER 'foreman'@'foreman-reports.wit.prod';
GRANT ALL PRIVILEGES ON foreman.* TO 'foreman'@'foreman-reports.domain' IDENTIFIED BY 'foreman_password';

Memcached server

It’s quite easy to setup a memcached node. The package is available in the Ubuntu repositories.

$ sudo apt-get install memcached

Once installed, update the following values in configuration file /etc/memcached.conf:

-m 128
-l 0.0.0.0

Then restart the service.

$ sudo service memcached restart

Foreman (main instance)

Setup the foreman-installer from the debian repository:

$ wget -q http://deb.theforeman.org/foreman.asc -O- | sudo apt-key add -
$ echo "deb http://deb.theforeman.org/ $(lsb_release -sc) 1.5" | sudo tee -a /etc/apt/sources.list.d/foreman.list
$ echo "deb http://deb.theforeman.org/ plugins 1.5" | sudo tee -a /etc/apt/sources.list.d/foreman.list
$ sudo apt-get update && sudo apt-get install foreman-installer

Setup the foreman instance via foreman-installer:

$ sudo foreman-installer \
--no-enable-puppet \
--no-enable-foreman-plugin-bootdisk \
--no-enable-foreman-proxy \
--foreman-db-adapter=mysql2 \
--foreman-db-database=foreman \
--foreman-db-host=foreman-db.domain \
--foreman-db-manage=true \
--foreman-db-username=foreman \
--foreman-db-password='foreman_password' \
--foreman-db-port=3306 \
--foreman-db-type=mysql \
--foreman-server-ssl-ca=/etc/ssl/domain/certs/ca.pem \
--foreman-server-ssl-chain=/etc/ssl/domain/certs/ca.pem \
--foreman-server-ssl-cert=/etc/ssl/domain/certs/domain.pem \
--foreman-server-ssl-key=/etc/ssl/domain/private_keys/domain.pem

Now that the service is installed, we need to setup the memcached plugin:

$ echo "gem 'foreman_memcache'" | sudo tee -a /usr/share/foreman/bundler.d/Gemfile.local.rb
$ sudo chown foreman:foreman /usr/share/foreman/bundler.d/Gemfile.local.rb
$ cd ~foreman
$ sudo -u foreman bundle update foreman_memcached

Then configure it by appending the following content in /etc/foreman/settings.yaml:

:memcache:
 :hosts:
 - memcached.domain:11211
 :options:
 :namespace: foreman
 :expires_in: 86400
 :compress: true

And restart the Apache service:

$ sudo service apache2 restart

You can now access your foreman instance via http://foreman.domain, access it and use the default credentials to login : admin/changeme.

Once logged in, you’ll need to retrieve some specific values from the settings, but also setup a few ones. Go to Administer > Settings > Auth and retrieve the following settings:

  • oauth_consumer_key
  • oauth_consumer_secret

These settings will be used in the following setup procedures, you’ll need to replace the values MY_CONSUMER_KEY and MY_CONSUMER_SECRET with the values you’ve retrieved.

We also need to setup the following settings:

  • require_ssl_puppetmasters=false
  • trusted_puppetmaster_hosts=[puppet.domain, puppet-lb.domain, puppet-lb-passive.domain]

SSL (optional):

Update the following settings in Administer > Settings > Provisioning:

  • ssl_ca_file = /etc/ssl/domain/certs/ca.pem
  •  ssl_certificate = /etc/ssl/domain/certs/domain.pem
  •  ssl_priv_key = /etc/ssl/domain/private_keys/domain.pem

Foreman ENC & Reports

Apply these configurations on each nodes.

Setup the foreman-installer from the debian repository:

$ wget -q http://deb.theforeman.org/foreman.asc -O- | sudo apt-key add -
$ echo "deb http://deb.theforeman.org/ $(lsb_release -sc) 1.5" | sudo tee -a /etc/apt/sources.list.d/foreman.list
$ echo "deb http://deb.theforeman.org/ plugins 1.5" | sudo tee -a /etc/apt/sources.list.d/foreman.list
$ sudo apt-get update && sudo apt-get install foreman-installer

Setup the foreman instance via foreman-installer:

$ sudo foreman-installer \
--no-enable-puppet \
--no-enable-foreman-plugin-bootdisk \
--no-enable-foreman-proxy \
--foreman-db-adapter=mysql2 \
--foreman-db-database=foreman \
--foreman-db-host=foreman-db.domain \
--foreman-db-manage=false \
--foreman-db-username=foreman \
--foreman-db-password='foreman_password' \
--foreman-db-port=3306 \
--foreman-db-type=mysql \
--foreman-server-ssl-ca=/etc/ssl/domain/certs/ca.pem \
--foreman-server-ssl-chain=/etc/ssl/domain/certs/ca.pem \
--foreman-server-ssl-cert=/etc/ssl/domain/certs/domain.pem \
--foreman-server-ssl-key=/etc/ssl/domain/private_keys/domain.pem \
--foreman-oauth-consumer-key=MY_CONSUMER_KEY \
--foreman-oauth-consumer-secret=MY_CONSUMER_SECRET

After the service is installed, we need to setup the memcached plugin like previously:

$ echo "gem 'foreman_memcache'" | sudo tee -a /usr/share/foreman/bundler.d/Gemfile.local.rb
$ sudo chown foreman:foreman /usr/share/foreman/bundler.d/Gemfile.local.rb
$ cd ~foreman
$ sudo -u foreman bundle update foreman_memcached

Then configure it by appending the following content in /etc/foreman/settings.yaml:

:memcache:
 :hosts:
 - memcached.domain:11211
 :options:
 :namespace: foreman
 :expires_in: 86400
 :compress: true

And restart the Apache service:

$ sudo service apache2 restart

Setup the Puppet block

Setup for the foreman block introduction.

Puppet CA

Setup the foreman-installer from the debian repository:

$ wget -q http://deb.theforeman.org/foreman.asc -O- | sudo apt-key add -
$ echo "deb http://deb.theforeman.org/ $(lsb_release -sc) 1.5" | sudo tee -a /etc/apt/sources.list.d/foreman.list
$ echo "deb http://deb.theforeman.org/ plugins 1.5" | sudo tee -a /etc/apt/sources.list.d/foreman.list
$ sudo apt-get update && sudo apt-get install foreman-installer

Clear your Puppet SSL folder before the installation:

$ sudo rm -rf /var/lib/puppet/ssl/*

Setup the puppetmaster instance via foreman-installer:

$ sudo foreman-installer \
--no-enable-foreman-plugin-bootdisk \
--no-enable-foreman-plugin-setup \
--no-enable-foreman \
--foreman-proxy-foreman-base-url=https://foreman.domain \
--foreman-proxy-tftp=false \
--foreman-proxy-ssl-ca=/etc/ssl/domain/certs/ca.pem \
--foreman-proxy-ssl-cert=/etc/ssl/domain/certs/wit.prod.pem \
--foreman-proxy-ssl-key=/etc/ssl/domain/private_keys/wit.prod.pem \
--foreman-proxy-oauth-consumer-key=MY_CONSUMER_KEY \
--foreman-proxy-oauth-consumer-secret=MY_CONSUMER_SECRET \

Once the service is installed, you’ll need to generate certificates for your puppet masters:

$ sudo puppet cert generate puppetmaster-1.domain --dns_alt_names=puppet,puppet.domain,puppetmaster-1.domain
$ sudo puppet cert generate puppetmaster-2.domain --dns_alt_names=puppet,puppet.domain,puppetmaster-2.domain

This will generate the following files:

  • /var/lib/puppet/ssl/certs/puppetmaster-1.domain.pem
  • /var/lib/puppet/ssl/certs/puppetmaster-2.domain.pem
  • /var/lib/puppet/ssl/private_keys/puppetmaster-1.domain.pem
  • /var/lib/puppet/ssl/private_keys/puppetmaster-2.domain.pem

Puppetmasters 1 & 2

Retrieve the certificates previously generated and put them in the same folder on the puppetmasters nodes:

  • /var/lib/puppet/ssl/certs/
  • /var/lib/puppet/ssl/private_keys/

Ensure the permissions are OK:

$ chown -R puppet:puppet /var/lib/puppet/ssl

Setup the foreman-installer from the debian repository:

$ wget -q http://deb.theforeman.org/foreman.asc -O- | sudo apt-key add -
$ echo "deb http://deb.theforeman.org/ $(lsb_release -sc) 1.5" | sudo tee -a /etc/apt/sources.list.d/foreman.list
$ echo "deb http://deb.theforeman.org/ plugins 1.5" | sudo tee -a /etc/apt/sources.list.d/foreman.list
$ sudo apt-get update && sudo apt-get install foreman-installer

Setup the puppetmaster instances using the foreman-installer:

$ sudo foreman-installer \
--no-enable-foreman-plugin-bootdisk \
--no-enable-foreman-plugin-setup \
--no-enable-foreman \
--puppet-ca-server=puppet.domain \
--puppet-server-ca=false \
--puppet-server-foreman-ssl-ca=/etc/ssl/domain/certs/ca.pem \
--puppet-server-foreman-ssl-cert=/etc/ssl/domain/certs/domain.pem \
--puppet-server-foreman-ssl-key=/etc/ssl/domain/private_keys/domain.pem \
--puppet-server-foreman-url=https://foreman.domain \
--foreman-proxy-foreman-base-url=https://foreman.domain \
--foreman-proxy-tftp=false \
--foreman-proxy-puppetca=false \
--foreman-proxy-ssl-ca=/etc/ssl/domain/certs/ca.pem \
--foreman-proxy-ssl-cert=/etc/ssl/domain/certs/domain.pem \
--foreman-proxy-ssl-key=/etc/ssl/domain/private_keys/domain.pem \
--foreman-proxy-oauth-consumer-key=MY_CONSUMER_KEY \
--foreman-proxy-oauth-consumer-secret=MY_CONSUMER_SECRET \

Now edit your Puppet configuration under /etc/puppet/puppet.conf and add the following lines under the appropriate sections:

[main]
ca_port = 8141
[master]
dns_alt_names = puppet,puppet.domain,puppetmaster-X.domain

Note: replace puppetmaster-X.domain by the fqdn of the instance you setup (either puppetmaster-1.domain or puppetmaster-2.domain).

After that, we’ll need to tell the puppetmaster to use the foreman-enc node as the ENC. Edit the foreman URL in the file /etc/puppet/node.rb to point foreman-enc.domain.

Next, we’ll need to tell the puppetmaster to use the foreman-reports node for the reports. So edit the foreman URL in the file /usr/lib/ruby/vendor_ruby/puppet/reports/foreman.rb to point foreman-reports.domain.

Restart apache and you’re done with the puppetmaster setup :

$ sudo service apache2 restart

HAProxy

You’ll need to setup haproxy and keepalived on the following nodes:

  • puppet-lb.domain
  • puppet-lb-passive.domain

Install haproxy and keepalived on both nodes:

$ sudo apt-get install haproxy keepalived

Configure keepalived

On the active node (puppet-lb.domain), create a file /etc/keepalived/keepalived.conf with the following content:

global_defs {
  lvs_id lb_internal_KA
}

vrrp_script check_haproxy {
  script "killall -0 haproxy"
  interval 2
  weight 2
}

vrrp_instance VIP_01 {
  state MASTER
  interface eth0
  virtual_router_id 57
  priority 101

  virtual_ipaddress {
    10.0.0.11
  }

  track_script {
    check_haproxy
  }
}

On the passive node (puppet-lb-passive.domain), the content of this file will be different:

global_defs {
  lvs_id lb_internal_KA_passive
}

vrrp_script check_haproxy {
  script "killall -0 haproxy"
  interval 2
  weight 2
}

vrrp_instance VIP_01 {
  state SLAVE
  interface eth0
  virtual_router_id 57
  priority 100

  virtual_ipaddress {
    10.0.0.11
  }

  track_script {
    check_haproxy
  }
}

NOTE: The directive virtual_router_id NEEDS to have a unique value, if another keepalived cluster exists on the same network/vlan an error will occur if you set the same ID.

Configure haproxy

Here is a sample configuration for haproxy, use it on both nodes (puppet-lb.domain & puppet-lb-passive.domain) :

global
        log 127.0.0.1   local1
        maxconn 4096
        user haproxy
        group haproxy
        daemon

defaults
        log     global
        option  dontlognull
        retries 3
        option redispatch
        maxconn 2000
        contimeout      300000
        clitimeout      300000
        srvtimeout      300000

listen  stats *:1936
        mode http
        stats enable
        stats hide-version
        stats realm Haproxy\ Statistics
        stats uri /


listen puppet    *:8140
       mode      tcp
       option    tcplog
       option    ssl-hello-chk
       server    puppetmaster-1        puppetmaster-1.domain:8140     check inter 5000 fall 3 rise 2
       server    puppetmaster-2        puppetmaster-2.domain:8140     check inter 5000 fall 3 rise 2

listen  puppetca  *:8141
        mode      tcp
        option    tcplog
        option    ssl-hello-chk
        option    abortonclose
        server    puppetca-1            puppet-ca-1.domain:8140              check inter 5000 fall 3 rise 2

Now, restart your haproxy service :

$ sudo service haproxy restart

And that’s it ! Your Foreman/Puppet model is ready !

The node configuration

You will need to add the following properties under the [agent] section in your Puppet node configuration file in /etc/puppet/puppet.conf:

[agent]
    report        = true
    masterport    = 8140
    server        = puppet.domain
    ca_port       = 8141

Suggestions and evolutions

Modules synchronisation between the Puppet masters

I recommend the use of r10k to easily synchronize your modules between the Puppet masters.

More information here: https://github.com/puppetlabs/r10k

The main SPOF: the Foreman ENC

Actually, the main SPOF in this architecture is the Foreman ENC box. I guess you could add an Apache service in front of the Foreman services to achieve a passive/active Foreman ENC. You can also update the URL of the ENC on the Puppet masters with the URL of the Foreman reports box while trying to diagnosis your problem with the ENC.

High availability on the database

Of course, the other SPOF in this architecture is MySQL. There is many existing solutions for MySQL high availability, you could use another HAProxy service in front of your MySQL instances for an active/passive cluster and setup a replication architecture.

High availability on the Puppet CA

About the Puppet CA, I do not consider it as a SPOF because if the service goes down, the model will still be working. You’ll just not be able to add new nodes.

You could have a highly available service by adding another Puppet CA to the load balancer (a passive CA for example) but you would need a way to synchronize your certificates between your Puppet CA servers. Maybe using rsync or a NFS mount point.

Special thanks and feedbacks

Feel free to ask any questions or comment on this blog, or you can catch me on Freenode IRC, @deviantony on channel #theforeman.

Finally I’d like to thanks the people who helped me to setup this architecture: @elobato, @dominic and @gwmngilfen from the channel #theforeman. Do not hesitate to ask any questions over there, the channel is pretty active and the Foreman team members are really reactive !

Install PhantomJS as a service on Ubuntu 12.04

Hello there,

I’ll show you how to install the headless webkit PhantomJS 1.9.7 on Ubuntu 12.04 and how to manage it as a system service.

Installation

The following lines will download the phantomjs archive, extract it and create the symbolic links to the binary in /usr/local/share/usr/local/bin and /usr/bin.

$ cd /usr/local/share
$ sudo wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.7-linux-x86_64.tar.bz2
$ sudo tar xjf phantomjs-1.9.7-linux-x86_64.tar.bz2
$ sudo ln -s /usr/local/share/phantomjs-1.9.7-linux-x86_64/bin/phantomjs /usr/local/share/phantomjs
$ sudo ln -s /usr/local/share/phantomjs-1.9.7-linux-x86_64/bin/phantomjs /usr/local/bin/phantomjs
$ sudo ln -s /usr/local/share/phantomjs-1.9.7-linux-x86_64/bin/phantomjs /usr/bin/phantomjs

You can retrieve the installation script on Gist: phantomjs_installer.

Service configuration

Once the binary is installed, we’ll create a script to manage the service in /etc/init.d and a configuration file in /etc/default.

You can also retrieve the init script on Gist: phantomjs init script.

/etc/init.d/phantomjs

#! /bin/sh
# Init. script for phantomjs, based on Ubuntu 12.04 skeleton.
# Author: Anthony Lapenna <lapenna.anthony@gmail.com>

PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="Phantomjs service"
NAME=phantomjs
DAEMON=/usr/bin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

# Exit if the package is not installed
[ -x "$DAEMON" ] || exit 0

# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh

# Define LSB log_* functions.
# Depend on lsb-base (>= 3.2-14) to ensure that this file is present
# and status_of_proc is working.
. /lib/lsb/init-functions

# Overridable options section
WEBDRIVER_PORT=8190
DEBUG=false
LOG_FILE=/var/log/phantomjs.log

# Read configuration variable file if it is present
[ -r /etc/default/$NAME ] && . /etc/default/$NAME

DAEMON_ARGS="--webdriver=$WEBDRIVER_PORT --debug=$DEBUG"

#
# Function that starts the daemon/service
#
do_start()
{
 # Return
 # 0 if daemon has been started
 # 1 if daemon was already running
 # 2 if daemon could not be started
 start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --test > /dev/null \
 || return 1
 start-stop-daemon --start --background --quiet --pidfile $PIDFILE --exec /bin/bash -- -c "$DAEMON $DAEMON_ARGS > $LOG_FILE 2>&1" \
 || return 2
 # Add code here, if necessary, that waits for the process to be ready
 # to handle requests from services started subsequently which depend
 # on this one. As a last resort, sleep for some time.
}

#
# Function that stops the daemon/service
#
do_stop()
{
 # Return
 # 0 if daemon has been stopped
 # 1 if daemon was already stopped
 # 2 if daemon could not be stopped
 # other if a failure occurred
 start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --pidfile $PIDFILE --name $NAME
 RETVAL="$?"
 [ "$RETVAL" = 2 ] && return 2
 # Wait for children to finish too if this is a daemon that forks
 # and if the daemon is only ever run from this initscript.
 # If the above conditions are not satisfied then add some other code
 # that waits for the process to drop all resources that could be
 # needed by services started subsequently. A last resort is to
 # sleep for some time.
 start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --exec $DAEMON
 [ "$?" = 2 ] && return 2
 # Many daemons don't delete their pidfiles when they exit.
 rm -f $PIDFILE
 return "$RETVAL"
}

#
# Function that sends a SIGHUP to the daemon/service
#
do_reload() {
 #
 # If the daemon can reload its configuration without
 # restarting (for example, when it is sent a SIGHUP),
 # then implement that here.
 #
 start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE --name $NAME
 return 0
}

case "$1" in
 start)
 [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
 do_start
 case "$?" in
 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
 esac
 ;;
 stop)
 [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
 do_stop
 case "$?" in
 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
 esac
 ;;
 status)
 status_of_proc "$DAEMON" "$NAME" && exit 0 || exit $?
 ;;
 #reload|force-reload)
 #
 # If do_reload() is not implemented then leave this commented out
 # and leave 'force-reload' as an alias for 'restart'.
 #
 #log_daemon_msg "Reloading $DESC" "$NAME"
 #do_reload
 #log_end_msg $?
 #;;
 restart|force-reload)
 #
 # If the "reload" option is implemented then remove the
 # 'force-reload' alias
 #
 log_daemon_msg "Restarting $DESC" "$NAME"
 do_stop
 case "$?" in
 0|1)
 do_start
 case "$?" in
 0) log_end_msg 0 ;;
 1) log_end_msg 1 ;; # Old process is still running
 *) log_end_msg 1 ;; # Failed to start
 esac
 ;;
 *)
 # Failed to stop
 log_end_msg 1
 ;;
 esac
 ;;
 *)
 echo "Usage: $SCRIPTNAME {start|stop|status|restart|force-reload}" >&2
 exit 3
 ;;
esac

:

This file needs to have execution permissions:

$ sudo chmod +x /etc/init.d/phantomjs

You can define overridable options in the following file:

/etc/default/phantomjs

WEBDRIVER_PORT=8190

If you want the service to start at boot time, type the following command:

$ update-rc.d phantomjs defaults

And here you go, you can now manage phantomjs using the service start/stop/status commands. Note that the stop command will requires a few seconds to shutdown the process.

Enjoy!

Logstash – Debug configuration

This post is a continuation of my previous post about the ELK stack setup, see here: how to setup an ELK stack.

I’ll show you how I’m using the logstash indexer component to start a debug process in order to test the logstash filters.

The aim is to start the indexer to parse the stdin so you can try inputs on the command line and see directly the result on stdout.

Update: I’ve recently created a tool to start a volatile ELK stack, you can also use it to test your filters: check it here.

Configuration

You’ll need to setup a configuration equivalent to the default one in /etc/logstash/conf.d, let’s say /etc/logstash/debug.d with the following files:

  • 00_input.conf
  • 02_filter_debug.conf
  • 10_output.conf

Input section

We are gonna told logstash to use stdin as it’s input.

/etc/logstash/conf.d/00_input.conf

input {
    stdin { }
}

Filter section

In the 02_filter_debug.conf file, you’ll define the filters you want to test.

filter {
 grok,mutate,drop...
}

Output section

The output will be stdout, so you can see the result (in JSON) of the filter processing directly in the console. We will also told logstash to duplicate the output into a file.

/etc/logstash/conf.d/10_output.conf

output {
    stdout {
        codec => "json"
    }
    file {
        codec => "json"
        path => "/tmp/debug-filters.json"
    }
}

Debugging

Now that the configuration is done, you’ll need to start the logstash binary with the debug configuration folder as a parameter:

$ /opt/logstash/bin/logstash -f /etc/logstash/debug.d -l /var/log/logstash/logstash-debug.log

The agent will take a few seconds to start, and then you’re ready for debug ! All you’ve got to do is copy your text in the command line and logstash will apply the filters defined in the filter section to it, then it will output the result on the command line.

To interrupt the logstash process, you’ll need to type the following commands: Ctrl+C and then Ctrl+D.

Have fun with your logs !

Logstash recipe – MySQL slow log

I’ll describe here how to use logstash and logstash-forwarder to harvest the mysql slow log on a database server so you can centralize it in elasticsearch and kibana.

It is recommended to check my previous post for the software setup : Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu

Setup logstash-forwarder

You can follow the steps described in my previous post to install logstash-forwarder on your web server.

Once installed, you’ll need to create the configuration file /etc/logstash-forwarder with the following content:

{
        "network": {
                "servers": [ "logstash-node:5001" ],
                "ssl ca": "/etc/ssl/logstash.pub",
                "timeout": 15
        },
        "files": [
                {
                        "paths": [
                        "/var/log/mysql/slow_query.log",
                        ],
                        "fields": { "type": "mysql" }
                }
        ]
}

Process the logs in the logstash indexer

Add the following input definition to the configuration file:

/etc/logstash/conf.d/00_input.conf

lumberjack {
           port => 5001
           tags => ["mysql_slow_log"]
           ssl_certificate => "/etc/ssl/logstash.pub"
           ssl_key => "/etc/ssl/logstash.key"
}

This configuration will told logstash to listen and receive events via the lumberjack protocol. It will listen on port 5001 to reiceve events and automatically add a tag mysql_slow_log to each.

See http://logstash.net/docs/1.4.1/inputs/lumberjack for more information on this input.

/etc/logstash/conf.d/02_filter_mysql.conf

filter {
    if "MYSQL_SLOW_LOG" in [tags] {

        if [message] =~ "^# Time:.*$" {
                  drop {}
        }


        multiline {
                  pattern => "^# User@Host:.*$"
                  negate => true
                  what => "previous"
        }

        grok {
             match => [
                      "message", "(?m)^# User@Host: %{GREEDYDATA:user}\[%{GREEDYDATA}\] @ \[%{IP:client_ip}\]\s*# Query_time: %{NUMBER:query_time:float} Lock_time: %{NUMBER:query_lock_time:float} Rows_sent: %{NUMBER:query_rows_sent:int} Rows_examined: %{NUMBER:query_rows_examined:int}\s*SET timestamp=%{NUMBER:log_timestamp};\s*%{GREEDYDATA:query}$",
                      "message", "(?m)^# User@Host: %{GREEDYDATA:user}\[%{GREEDYDATA}\] @ \[%{IP:client_ip}\]\s*# Query_time: %{NUMBER:query_time:float} Lock_time: %{NUMBER:query_lock_time:float} Rows_sent: %{NUMBER:query_rows_sent:int} Rows_examined: %{NUMBER:query_rows_examined:int}\s*use %{GREEDYDATA:database};\s*SET timestamp=%{NUMBER:log_timestamp};\s*%{GREEDYDATA:query}$",
                      "message", "(?m)^# User@Host: %{GREEDYDATA:user}\[%{GREEDYDATA}\] @ %{GREEDYDATA:client_ip} \[\]\s*# Query_time: %{NUMBER:query_time:float} Lock_time: %{NUMBER:query_lock_time:float} Rows_sent: %{NUMBER:query_rows_sent:int} Rows_examined: %{NUMBER:query_rows_examined:int}\s*SET timestamp=%{NUMBER:log_timestamp};\s*%{GREEDYDATA:query}$"
                      ]
             }

        date {
             match => ["log_timestamp", "UNIX"]
        }

        mutate {
               remove_field => "log_timestamp"
        }
    }
}

This configuration file will apply some filters on events tagged as mysql_slow_log.

Now, you need to restart logstash to apply the changes:

$ sudo service logstash restart

	

Logstash recipe – Apache access log

I’ll describe here how to use logstash and logstash-forwarder to harvest the apache access logs on a web server so you can centralize it in elasticsearch and kibana.

It is recommended to check my previous post for the software setup : Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu

Apache log format

Logstash is able to parse JSON by using a specific input codec, we’ll define a new logging format in JSON for the apache access logs so we don’t have to define grok patterns in the indexer.

Create the following file on your webserver /etc/apache2/conf.d/apache-json-logging and add the following content:

LogFormat "{ \
            \"@timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \
            \"@version\": \"1\", \
            \"tags\":[\"apache\"], \
            \"message\": \"%h %l %u %t \\\"%r\\\" %>s %b\", \
            \"clientip\": \"%a\", \
            \"duration\": %D, \
            \"status\": %>s, \
            \"request\": \"%U%q\", \
            \"urlpath\": \"%U\", \
            \"urlquery\": \"%q\", \
            \"bytes\": %B, \
            \"method\": \"%m\", \
            \"site\": \"%{Host}i\", \
            \"referer\": \"%{Referer}i\", \
            \"useragent\": \"%{User-agent}i\" \
           }" apache_access_json

See http://httpd.apache.org/docs/2.2/mod/mod_log_config.html for more information on log format variables.

Now you need to told one or more of your defined VHost to use this custom log format. Edit one of your available sites and add the following line:

CustomLog       ${APACHE_LOG_DIR}/my_site.com_access.log.json apache_access_json

Restart the web server:

$ sudo service apache2 restart

Setup logstash-forwarder

You can follow the steps described in my previous post to install logstash-forwarder on your web server.

Once installed, you’ll need to create the configuration file /etc/logstash-forwarder with the following content:

{
    "network": {
        "servers": [
            "logstash-node:5000"
        ],
        "ssl ca": "/etc/ssl/logstash.pub",
        "timeout": 15
    },
    "files": [
        {
            "paths": [
                "/var/log/apache2/my_site.com_access.log.json"
            ],
            "fields": {
                "type": "apache"
            }
        }
    ]
}

Restart the service to reload the configuration:

$ sudo service logstash-forwarder restart

Process the logs in the logstash indexer

Add the following input definition to the configuration file:

/etc/logstash/conf.d/00_input.conf

lumberjack {
           port => 5000
           tags => ["apache_access_json"]
           codec => "json"
           ssl_certificate => "/etc/ssl/logstash.pub"
           ssl_key => "/etc/ssl/logstash.key"
}

This configuration will told logstash to listen and receive events via the lumberjack protocol. It will listen on port 5000, use the json codec to process events and automatically add a tag apache_access_json to each.

See http://logstash.net/docs/1.4.1/inputs/lumberjack for more information on this input.

/etc/logstash/conf.d/02_filter_apache.conf

filter {
       if "apache_access_json" in [tags] {

          if [useragent] != "-" and [useragent] != "" {
             useragent {
                       add_tag => [ "UA" ]
                       source => "useragent"
                       prefix => "UA-"
             }
          }

          mutate {
                 convert => ['duration', 'float']
          }

          ruby {
               code => "event['duration']/=1000000"
          }

          if [bytes] == 0 { mutate { remove_field => "[bytes]" } }
          if [urlquery]              == "" { mutate { remove_field => "urlquery" } }
          if [method]    =~ "(HEAD|OPTIONS)" { mutate { remove_field => "method" } }
          if [useragent] == "-"              { mutate { remove_field => "useragent" } }
          if [referer]   == "-"              { mutate { remove_field => "referer" } }

          if "UA" in [tags] {
             if [device] == "Other" { mutate { remove_field => "device" } }
             if [name]   == "Other" { mutate { remove_field => "name" } }
             if [os]     == "Other" { mutate { remove_field => "os" } }
          }
       }
}

This configuration file will apply some filters on events tagged as apache_access_json.

You’ll notice the fields such as bytes, useragent, duration… The fields are automatically setted by logstash during the event reception using the json codec.

Now, you need to restart logstash to apply the changes:

$ sudo service logstash restart

Centralized logging with an ELK stack (Elasticsearch-Logstash-Kibana) on Ubuntu

Update 22/12/2015

I’ve reviewed the book Learning ELK stack by Packt Publishing, it’s available online for 5$ only: https://www.packtpub.com/big-data-and-business-intelligence/learning-elk-stack/?utm_source=DD-deviantonywp&utm_medium=referral&utm_campaign=OME5D2015

I’ve recently setup an ELK stack in order to centralize the logs of many services in my company, and it’s just amazing !

I’ve used the following versions of the softwares on Ubuntu 12.04 (also works on Ubuntu 14.04):

  • Elasticsearch 1.4.1
  • Kibana 3.1.2
  • Logstash 1.4.2
  • Logstash-forwarder 0.3.1

About the softwares

Elasticsearch

Elastichsearch is a RESTful distributed search engine using a NoSQL database and based on the Apache Lucene engine. Developped by the Elasticsearch company which also owns Kibana and Logstash.

Elasticsearch Homepage

Logstash

Logstash is a tool used to harvest and filter logs, it’s developed in Java under Apache 2.0 license.

Logstash Homepage

Logstash-forwarder

Logstash-forwarder (previously named Lumberjack) is one of the many log shippers compliants with Logstash.

It has the following advantages:

  • a light footprint (written in Go, no need for a Java Virtual Machine to harvest logs)
  • uses a data compression algorithm
  • uses encryption to send data over the network

Logstash-forwarder Homepage

Kibana

Kibana is a web UI allowing to search and display data stored by Logstash in Elasticsearch.

Kibana Homepage

Architecture

Here is a simple schema of the expected architecture : We will use logstash-forwarder (using the lumberjack protocol) on each server where we want to harvest the logs. These nodes will send data to the indexer : logstash. This one will process them using filters and send the formatted data to elasticsearch.

Kibana, the UI, will allow to display and compile the data. This architecture is scalable, you can quickly add more indexers nodes by adding logstash instances. Same for elasticsearch, which works as a cluster of one node by default.

Setup the Elasticsearch node

NOTE: A one node elasticsearch cluster is not recommended for production, I’ve added another blog post describing the setup of a three node cluster on Ubuntu 12.04, see How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04.

Requirements

elasticsearch is using Java, you need to ensure you’ve got a JDK installed on your system and that it is available in the PATH.

Install via repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo &quot;deb http://packages.elasticsearch.org/elasticsearch/1.4/debian stable main&quot; | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update &amp;&amp; sudo apt-get install elasticsearch

You can also decide to start the elasticsearch service on boot using the following command:

$ sudo update-rc.d elasticsearch defaults 95 10

Configuration

You must specify the path to the Java JDK in the file /etc/default/elasticsearch to start the service by adding the following variable:

JAVA_HOME=/path/to/java/JDK

If you want to tune your elasticsearch installation, the configuration is available in the file /etc/elasticsearch/elasticsearch.yml.

You can now start the service:

$ sudo service elasticsearch start

Automatic index cleaning via Curator

You can use the curator program to delete indexes. See more information in the github repository: https://github.com/elasticsearch/curator

You’ll need pip in order to install curator:

$ sudo apt-get install python-pip

Once it’s done, you can install curator:

$ sudo pip install elasticsearch-curator

Now, it’s easy to setup a cron to delete the indexes older than 30 days in /etc/cron.d/elasticsearch_curator:

@midnight     root        curator delete --older-than 30 &gt;&gt; /var/log/curator.log 2&gt;&amp;1

Cluster overview via Marvel

NOTE: marvel needs to be installed on each node of an elasticsearch cluster in order to supervise the whole cluster.

See marvel’s homepage for more info.

Install it:

$ /usr/share/elasticsearch/bin/plugin -i elasticsearch/marvel/latest

Restart the elasticsearch service:

$ sudo service elasticsearch restart

Now you can access the marvel UI via your browser on : http://elasticsearch-host:9200/_plugin/marvel

Setup the Logstash node

Requirements

Logstash is using Java, you need to ensure you’ve got a JDK installed on your system and that it is available in the PATH.

Install via repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo &quot;deb http://packages.elasticsearch.org/logstash/1.4/debian stable main&quot; | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update &amp;&amp; sudo apt-get install logstash

Generate a SSL certificate

Use the following command to generate a SSL certificate request and private key in /etc/ssl:

$ openssl req -x509 -newkey rsa:2048 -keyout /etc/ssl/logstash.key -out /etc/ssl/logstash.pub -nodes -days 1095

Configuration

The logstash configuration is based in the /etc/logstash/conf.d directory by default. As the configuration can become quite messy with the time, I’ve managed to split the configuration in multiple files:

  • 00_input.conf
  • 02_filter_*.conf
  • 10_output.conf

This allows you to define separated sections for the logstash configuration:

Input section

You’ll define here all the inputs for the indexer, an input is a source on which logstash will read events. It can be file, a messaging queue connection… We are going to use the lumberjack input to communicate with the logstash-forwarder harvesters.

See: http://logstash.net/docs/1.4.2/inputs/lumberjack for more information.

Filter section

Filters are processing methods you will apply to the received events. For example, you can aplpy a calculation method on some numeric value, drop a specific event based on it’s text value… There is a LOT of filters you can use with logstash, see the documentation for more information.

I recommend using a specific configuration file for each service you want to process: 02_filter_apache.conf, 02_filter_mysql.conf

Output section

NOTE: There is another way to configure the logstash integration with an elasticsearch cluster, it’s more adaptable if have more than a node in your cluster, see How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04.

The output will define where will logstash send the processed events. We are going to use elasticsearch as the output destination:

/etc/logstash/conf.d/10_output.conf

output {
       elasticsearch {
                     host =&gt; &quot;elasticsearch-node&quot;
                     protocol =&gt; &quot;http&quot;
       }
}

Optional: Logstash contrib plugins

Starting from the version 1.4 of logstash, some plugins have been separated of the project. A new project was born : logstash-contrib, gathering a lot of plugins inside one bundle.

See http://logstash.net/docs/1.4.2/contrib-plugins for more information. It’s also available on Github: https://github.com/elasticsearch/logstash-contrib.

Installation:

$ /opt/logstash/bin/plugin install contrib

Setup the Kibana node

Requirements

You’ll need a web server to use kibana, I’ve chosen apache:

$ sudo apt-get install apache2

Installation

You can retrieve the latest archive for kibana here: http://www.elasticsearch.org/overview/kibana/installation/

Download and extract the archive in your webserver root (replace VERSION with the right version):

$ wget https://download.elasticsearch.org/kibana/kibana/kibana-VERSION.tar.gz
$ tar xvf kibana-*.tar.gz -C /var/www
$ sudo mv /var/www/kibana-VERSION /var/www/kibana

Now setup the default dashboard:

$ cp /var/www/kibana/app/dashboards/default.json /var/www/kibana/app/dashboards/default.json.bak
$ mv /var/www/kibana/app/dashboards/logstash.json /var/www/kibana/app/dashboards/default.json

Update the elasticsearch value in /var/www/kibana/config.js to match your elasticsearch node:

      elasticsearch: &quot;http://elasticsearch-host:9200&quot;,

You can now access the kibana UI: http://kibana-host/kibana

Setup the Logstash forwarder on a node

Install via repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo &quot;deb http://packages.elasticsearch.org/logstashforwarder/debian stable main&quot; | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update &amp;&amp; sudo apt-get install logstash-forwarder

Configuration

You’ll need to copy the public key you’ve generated previously on the logstash node in the same directory : /etc/ssl/logstash.pub

The configuration for the logstash-forwarder is defined in the file /etc/logstash-forwarder. You can see examples in my logstash recipes posts.

Logstash recipes

Here is a list of posts I’ve made for logstash recipes, they contain both logstash and logstash-forwarder configuration samples:

You can also check my post on how to debug your logstash filters.

Enjoy your logs !