The ELK stack powered by Docker – Updated !

Hola,

In a previous post, I’ve introduced the ELK stack powered by Docker & Fig (see the ELK stack powered by Docker).

I’ve recently decided to update the project to replace the usage of fig with compose and to replace all my custom images with the latest official images !

It is now based on the following Docker images available on Dockerhub:

01/11/2015 : Project updated !

As the project is based on the latest Docker images versions, it means Elasticsearch 2.x, Logstash 2.x and Kibana 4.2.x ! Feel free to discover the new features of these releases (have a look here: https://www.elastic.co/blog/release-we-have).

Note: For the nostalgic folks, you can still access the 1.x version (Elasticsearch 1.x, Logstash 1.x and Kibana 4.1.x) on the 1.x branch ! Here it is: https://github.com/deviantony/docker-elk/tree/1.x

Usage

Pre-requisites

You’ll need Docker and Docker Compose.

The following installation procedures have been tested on Ubuntu 14.04.

Docker installation

Use the following command to install Docker:

$ curl -sSL https://get.docker.com/ubuntu/ | sudo sh

Docker Compose installation

Follow the procedure available at https://docs.docker.com/compose/install/ to install the latest version of Docker Compose.

Use the stack

First, you’ll need to checkout the git repository:

$ git clone https://github.com/deviantony/docker-elk.git

By default, the stack is shipped with a simple Logstash configuration, it will listen for any TCP input on port 5000.

Then start the stack using Compose:

$ cd docker-elk
$ docker-compose up

Compose will start a container for each service of the ELK stack and output their logs.

If you’re still using the default input configuration for Logstash, you can inject some data into Elasticsearch from a file:

$ nc localhost 5000 < /some/log/file.log

Then you can check the results in Kibana by hitting the following URL in your browser: http://localhost:5601

Enjoy 🙂

Scaffold your Puppet modules !

Hej,

I’ve recently worked with a project scaffolding tool called Yeoman (see Yeoman homepage).

My first experiments where with an AngularJS generator used to bootstrap an AngularJS web application and I must say that the tool is doing it’s job fine.

Right after that, I’ve began to think about how could I use this bootstrapping tool in my company and the thing is that we are writing a lot of Puppet modules (we’ve wrote more than 80 at the time of writing).

Time to ease the module creation pain and introduce standardisation directly at the beginning, I’ve created a Puppet module generator.

You’ll need Node.js and npm in order to use it and I recommend you to use Node Version Manager (nvm).
For example, here is how to install nvm and a recent version of Node.js:

$ curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.25.4/install.sh | bash
$ nvm install 0.12.7

That’s it, you’ve installed Node.js v0.12.7, now let’s use it and install the Puppet module generator !

$ nvm use 0.12.7
$ npm install -g generator-puppet-module

The generator is now installed. All you need to do is to use it:

$ yo puppet-module myModuleName

Note: You’ll need to redo the nvm use step again if you’ve left your previous shell session.

It will prompt you for an author name and a company name (these parameters will be remembered at the next execution) and then will setup all the requirements for a proper Puppet module in your current directory !

Have fun with it, and do no hesitate to open issues/pull requests on the project: https://github.com/deviantony/yeoman-puppet

Puppet and Foreman : infrastructure as legos

Ahoj,

It’s been a while since I posted my last article. I’ve began writing this article 6 months ago when I wanted to prepare a simple introduction to Puppet and Foreman for all the curious people at my workplace.

We’ve been very busy with other subjects recently and I gave up on the subject… but I’ve recently decided to finish it because I hate things left undone !

The idea is to give an introduction to Puppet and Foreman (and especially Foreman and how to manage a set of server with it) using a simple analogy: you can build your servers as if you were building constructs with Legos.

Everybody has played with Legos, right? Time to play again !

Infrastructure as legos

First, let’s define what we are trying to achieve here.

We need to manage our server infrastructure. In our context, we’ll compare a server (either physical or virtual) with a Lego model, such as this car below.

lego-blocks-node

What we want is:

  • Produce as many of these models as we need
  • Customize these models attributes (colours, number of wheels…)

To resume, we want to provision & configure our servers.

Puppet

Let’s recap what is Puppet first. As stated on their website:

Puppet is a configuration management system that allows you to define the state of your IT infrastructure, then automatically enforces the correct state.

It is a tool that will help you automate your infrastructure management and sysadmin tasks.

In order to start with the Lego comparison, Puppet is the open-source Lego company, providing schematics to create your own block blueprints. It also provides you a workshop where you can share and retrieve blueprints created by other people: the Puppet forge.

The Foreman

The Foreman is a tool that can manage your servers lifecycle from creation, configuration, to destruction. In this analogy, it will be used in two different ways:

  1. Our blueprints catalog, as the Puppet ENC (what is an ENC?)
  2. Our model factory, as our provisioning system

Puppet module

A puppet class should be defined as a simple Lego block, it should do one thing and do it well ! They can have different shapes, color, sizes…

lego-block-class

Foreman configuration group

A foreman configuration group is a logical group of Puppet classes. It is not tied to an environment, so you can group in there any classes. I tend to see it as a logical block group to build my hostgroups / nodes.

lego-blocks-config-groups

Puppet environment

A Puppet environment is a set of Lego blocks available for the creation of configuration groups, hostgroups and nodes ! It is a list of classes in a defined state.

lego-blocks-env

Foreman hostgroup

A Foreman hostgroup is a container for multiple Puppet modules and/or configuration groups.

A hostgroup is a model structure that will be used by your Foreman nodes as their base. Note that you can nest hostgroups to inherit from previously defined structures.

lego-blocks-hostgroup

Foreman node

This is it, this is our server. At this point, we have the base structure and we can add other blocks and choose their colors (by specifying parameter values).

A foreman node is a server. In the Lego way, it is the final structure you want to create using available blocks inside a specific environment, a hostgroup as the base structure and any configuration groups.

lego_blocks_node

Have fun !

The ELK stack powered by Docker

UPDATE: The stack is now powered by docker-compose and using the latest official images for Elasticsearch/Logstash/Kibana. See my new article https://deviantony.wordpress.com/2015/07/23/the-elk-stack-powered-by-docker-updated/ — 23/07/2015

Ahoy,

I’ve recently created a solution to setup an ELK stack using the Docker engine, it can be used to:

  • Quickly boot an ELK stack for demo purposes
  • Use it to test your Logstash configurations and see the imported data in Elasticsearch
  • Import a subset of data into Elasticsearch and prepare dashboards on Kibana 3
  • Use Kibana 4 !

UPDATE: The stack is now fully functionnal with docker-compose as a replacement for fig. See http://docs.docker.com/compose/install/ for docker-compose installation. — 27/02/2015

The solution is available on Github: https://github.com/deviantony/docker-elk

It is based on multiple Docker images available on Dockerhub:

  • elk-elasticsearch: Latest 1.5 stable version of Elasticsearch + Marvel with Oracle Java JDK 7
  • elk-logstash: Latest 1.4 stable version of Logstash with Oracle Java JDK 7
  • elk-kibana: Kibana 3.1.2 or Kibana 4

Prerequisites

You’ll need Docker and Fig.

The following installation procedures have been tested on Ubuntu 14.04.

Docker installation

Use the following command to install Docker:

$ curl -sSL https://get.docker.com/ubuntu/ | sudo sh

Fig installation

Fig is available via pip, a tool dedicated to manage Python packages.

First, you’ll need to install pip if it is not already present on your system:

$ sudo apt-get install python-pip

Then, install fig:

$ sudo pip install -U fig

Use the stack

First, you’ll need to checkout the git repository:

$ git clone https://github.com/deviantony/fig-elk.git

By default, the stack is shipped with a simple Logstash configuration, it will listen for any TCP input on port 5000.

You can update the Logstash configuration by updating the file logstash-conf/logstash.conf (to test your filters, for example).

Then start the stack using fig:

$ cd fig-elk
$ fig up

Fig will start a container for each service of the ELK stack and output their logs.

If you’re still using the default input configuration for Logstash, you can inject some data into Elasticsearch from a file:

$ nc localhost 5000 < /some/log/file.log

Then you can check the results in Kibana 3, by hitting the following URL in your browser: http://localhost:8080

Or, if you’d like to use Kibana 4, hit the following URL: http://localhost:5601

Also Elasticsearch is shipped with Marvel, you have access to the cluster monitoring on the following URL http://localhost:9200/_plugin/marvel

Have fun with the ELK stack 🙂

VMWare tips

Here is a few tips I’ve learned with VMWare virtual machines hosting Ubuntu 12.04 / 14.04.

How to refresh an upgraded disk

If you want to resize a disk at runtime, go to your VM preferences and update your disk size. Now, how to refresh the state of the disk in the virtual machine without rebooting it?

For each disk in your VM, there is a related directory in the /sys/class/scsi_devices directory. Use the following command to rescan your disk capacity:

$ echo 1 > /sys/class/scsi_device/DISK/device/rescan

How to refresh new disk

If you want to add another disk at runtime, go to your VM preferences and add a new disk. Now, how to refresh the system so it can see the new disk without rebooting the VM?

Basically, you just need to rescan the SCSI bus to which the storages are connected.

The following commands assume you have root access on the system.

Find your host bus number

$ grep mpt /sys/class/scsi_host/host?/proc_name

This will return a line like the following:

/sys/class/scsi_host/host0/proc_name:mptspi

Where host0 is the relevant bus number.

Rescan the SCSI bus

$ echo "- - -" > /sys/class/scsi_host/host0/scan

RabbitMQ and HAProxy: a timeout issue

If you’re trying to setup a highly available RabbitMQ cluster using HAProxy, you may encounter a disconnection issue from your clients.

This problem is due to HAProxy having a timeout client (clitimeout is deprecated) setted for the default client timeout parameter. If a connection is considered idle for more than timeout client (ms), the connection is dropped by HAProxy.

RabbitMQ clients use persistent connections to a broker, which never timeout. See the problem here? If your RabbitMQ client is inactive for a period of time, HAProxy will automatically close the connection.

So how do we solve the problem ? I’ve seen that HAProxy got a clitcpka option which enable the sending of TCP keepalive packets on the client side.

Let’s use it !

But it’s not solving the problem, disconnection issue are still there. Damn.

After reading a discuss about RabbitMQ and HAProxy on the RabbitMQ mailing list, Tim Watson pointed out that:

[…]the exact behaviour of tcp keep-alive is determined by the underlying OS/Kernel configuration[…]

On Ubuntu 14.04, in the tcp man, you can see that the default value for the tcp_keepalive_time parameter is set to 2 hours. This parameter defines the time a connection needs to be idle before TCP begins sending out keep-alive packets.

You can also verify it by using the following command:

$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200

OK ! Let’s raise thetimeout client value in our HAProxy configuration for AQMP, 3 hours should be good. And that’s it ! No more disconnection issues 🙂

Here is a sample HAProxy configuration:

global
        log 127.0.0.1   local1
        maxconn 4096
        #chroot /usr/share/haproxy
        user haproxy
        group haproxy
        daemon
        #debug
        #quiet

defaults
        log     global
        mode    tcp
        option  tcplog
        retries 3
        option redispatch
        maxconn 2000
        timeout connect 5000
        timeout client 50000
        timeout server 50000

listen  stats :1936
        mode http
        stats enable
        stats hide-version
        stats realm Haproxy\ Statistics
        stats uri /

listen aqmp_front :5672
        mode            tcp
        balance         roundrobin
        timeout client  3h
        timeout server  3h
        option          clitcpka
        server          aqmp-1 rabbitmq1.domain:5672  check inter 5s rise 2 fall 3
        server          aqmp-2 rabbitmq2.domain:5672  check inter 5s rise 2 fall 3

Enjoy your highly available RabbitMQ cluster !

I think there may be another solution to this problem by using the heartbeat feature of RabbitMQ, see more about that here: https://www.rabbitmq.com/reliability.html

How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04

Hey there !

I’ve recently hit the limitations of a one node elasticsearch cluster in my ELK setup, see my previous blog post: Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu

After more researchs, I’ve decided to upgrade the stack architecture and more precisely the elasticsearch cluster and the logstash integration with the cluster.

I’ve been using the following software versions:

  • Elasticsearch 1.4.1
  • Logstash 1.4.2

Setup the Elasticsearch cluster

You’ll need to apply this procedure on each elasticsearch node.

Java

I’ve decided to install the Oracle JDK in replacement of the OpenJDK using the following PPA:

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update && sudo apt-get install oracle-java7-installer

In case you’re missing the add-apt-repository command, make sure you have the package python-software-properties installed:

$ sudo apt-get install python-software-properties

Install via Elasticsearch repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb http://packages.elasticsearch.org/elasticsearch/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update && sudo apt-get install elasticsearch

You can also decide to start the elasticsearch service on boot using the following command:

$ sudo update-rc.d elasticsearch defaults 95 10

Configuration

You’ll need to edit the elasticsearch configuration file in /etc/elasticsearch/elasticsearch.yml and update the following parameters:

  • cluster.name: my-cluster-name

I suggest to update the default cluster name with a defined cluster name. Especially if you want to have another cluster on your network with multicast enabled.

  • index.number_of_replicas: 2

This will ensure a copy of your data on every node of your cluster. Set this property to N-1 where N is the number of nodes in your cluster.

  • gateway.recover_after_nodes: 2

This will ensure the recovery process will start after at least 2 nodes in the cluster have been started.

  • discovery.zen.mininum_master_nodes: 2

Should be set to something like N/2 + 1 where N is the number of nodes in your cluster. This is to avoid the “split-brain” scenario.

See this post for more information on this scenario: http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/

Disabling multicast

Multicast is not recommended in production, disabling it will allow more control over your cluster:

  • discovery.zen.ping.multicast.enabled: false
  • discovery.zen.ping.unicast.hosts: [“host-1”, “host-2”]

Of course, you’ll need to specify the 2 others hosts for each node in your cluster:

  • host-1 will communicate with host-2 & host-3
  • host-2 will communicate with host-1 & host-3
  • host-3 will communicate with host-1 & host-2

Cluster overview via Marvel

It’s free for development use ! See marvel’s homepage for more info.

Install it:

$ /usr/share/elasticsearch/bin/plugin -i elasticsearch/marvel/latest

Restart the elasticsearch service:

$ sudo service elasticsearch start

Now you can access the marvel UI via your browser on any of your elasticsearch nodes.
For example the first node: http://elasticsearch-host-a:9200/_plugin/marvel

Automatic index cleaning via Curator

This tool can be installed on node only.

You can use the curator program to delete indexes. See more information in the github repository: https://github.com/elasticsearch/curator

You’ll need pip in order to install curator:

$ sudo apt-get install python-pip

Once it’s done, you can install curator:

$ sudo pip install elasticsearch-curator

Now, it’s easy to setup a cron to delete the indexes older than 30 days in /etc/cron.d/elasticsearch_curator:

@midnight     root        curator delete --older-than 30 >> /var/log/curator.log 2>&1

Setup the Logstash node

Java

Logstash is using Java, you need to ensure you’ve got a JDK installed on your system. Use either OpenJDK or Oracle JDK.

Install via repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb http://packages.elasticsearch.org/logstash/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update && sudo apt-get install logstash

Generate a SSL certificate

Use the following command to generate a SSL certificate request and private key in /etc/ssl:

$ openssl req -x509 -newkey rsa:2048 -keyout /etc/ssl/logstash.key -out /etc/ssl/logstash.pub -nodes -days 1095

Configuration

I’ll skip the configuration for inputs, filter and specify only the output configuration for the communication with the elasticsearch cluster.

We’re not going to specify a elasticsearch host anymore, instead we will specify that this logstash instance needs to communicate with the cluster.

/etc/logstash/conf.d/10_output.conf

output {
       elasticsearch { }
}

Then we’ll edit the init script of logstash in /etc/init/logstash.conf and update the JAVA_OPTS var with:

LS_JAVA_OPTS="-Djava.io.tmpdir=${LS_HOME} -Des.config=/etc/logstash/elasticsearch.yml"

And create the file /etc/logstash/elasticsearch.yml with the following content:

cluster.name: my-cluster-name
node.name: logstash-indexer-01

If you’ve disabled multicast, then you’ll need to add the following line:

discovery.zen.ping.unicast.hosts: "elasticsearch-node-1", "elasticsearch-node-2", "elasticsearch-node-3"]

And start logstash:

$ sudo service logstash start

The logstash node will automatically be added to your elasticsearch cluster.
You can verify that by checking the node count and version in the marvel UI.