How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04

Hey there !

I’ve recently hit the limitations of a one node elasticsearch cluster in my ELK setup, see my previous blog post: Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu

After more researchs, I’ve decided to upgrade the stack architecture and more precisely the elasticsearch cluster and the logstash integration with the cluster.

I’ve been using the following software versions:

  • Elasticsearch 1.4.1
  • Logstash 1.4.2

Setup the Elasticsearch cluster

You’ll need to apply this procedure on each elasticsearch node.

Java

I’ve decided to install the Oracle JDK in replacement of the OpenJDK using the following PPA:

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update && sudo apt-get install oracle-java7-installer

In case you’re missing the add-apt-repository command, make sure you have the package python-software-properties installed:

$ sudo apt-get install python-software-properties

Install via Elasticsearch repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb http://packages.elasticsearch.org/elasticsearch/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update && sudo apt-get install elasticsearch

You can also decide to start the elasticsearch service on boot using the following command:

$ sudo update-rc.d elasticsearch defaults 95 10

Configuration

You’ll need to edit the elasticsearch configuration file in /etc/elasticsearch/elasticsearch.yml and update the following parameters:

  • cluster.name: my-cluster-name

I suggest to update the default cluster name with a defined cluster name. Especially if you want to have another cluster on your network with multicast enabled.

  • index.number_of_replicas: 2

This will ensure a copy of your data on every node of your cluster. Set this property to N-1 where N is the number of nodes in your cluster.

  • gateway.recover_after_nodes: 2

This will ensure the recovery process will start after at least 2 nodes in the cluster have been started.

  • discovery.zen.mininum_master_nodes: 2

Should be set to something like N/2 + 1 where N is the number of nodes in your cluster. This is to avoid the “split-brain” scenario.

See this post for more information on this scenario: http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/

Disabling multicast

Multicast is not recommended in production, disabling it will allow more control over your cluster:

  • discovery.zen.ping.multicast.enabled: false
  • discovery.zen.ping.unicast.hosts: [“host-1”, “host-2”]

Of course, you’ll need to specify the 2 others hosts for each node in your cluster:

  • host-1 will communicate with host-2 & host-3
  • host-2 will communicate with host-1 & host-3
  • host-3 will communicate with host-1 & host-2

Cluster overview via Marvel

It’s free for development use ! See marvel’s homepage for more info.

Install it:

$ /usr/share/elasticsearch/bin/plugin -i elasticsearch/marvel/latest

Restart the elasticsearch service:

$ sudo service elasticsearch start

Now you can access the marvel UI via your browser on any of your elasticsearch nodes.
For example the first node: http://elasticsearch-host-a:9200/_plugin/marvel

Automatic index cleaning via Curator

This tool can be installed on node only.

You can use the curator program to delete indexes. See more information in the github repository: https://github.com/elasticsearch/curator

You’ll need pip in order to install curator:

$ sudo apt-get install python-pip

Once it’s done, you can install curator:

$ sudo pip install elasticsearch-curator

Now, it’s easy to setup a cron to delete the indexes older than 30 days in /etc/cron.d/elasticsearch_curator:

@midnight     root        curator delete --older-than 30 >> /var/log/curator.log 2>&1

Setup the Logstash node

Java

Logstash is using Java, you need to ensure you’ve got a JDK installed on your system. Use either OpenJDK or Oracle JDK.

Install via repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb http://packages.elasticsearch.org/logstash/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update && sudo apt-get install logstash

Generate a SSL certificate

Use the following command to generate a SSL certificate request and private key in /etc/ssl:

$ openssl req -x509 -newkey rsa:2048 -keyout /etc/ssl/logstash.key -out /etc/ssl/logstash.pub -nodes -days 1095

Configuration

I’ll skip the configuration for inputs, filter and specify only the output configuration for the communication with the elasticsearch cluster.

We’re not going to specify a elasticsearch host anymore, instead we will specify that this logstash instance needs to communicate with the cluster.

/etc/logstash/conf.d/10_output.conf

output {
       elasticsearch { }
}

Then we’ll edit the init script of logstash in /etc/init/logstash.conf and update the JAVA_OPTS var with:

LS_JAVA_OPTS="-Djava.io.tmpdir=${LS_HOME} -Des.config=/etc/logstash/elasticsearch.yml"

And create the file /etc/logstash/elasticsearch.yml with the following content:

cluster.name: my-cluster-name
node.name: logstash-indexer-01

If you’ve disabled multicast, then you’ll need to add the following line:

discovery.zen.ping.unicast.hosts: "elasticsearch-node-1", "elasticsearch-node-2", "elasticsearch-node-3"]

And start logstash:

$ sudo service logstash start

The logstash node will automatically be added to your elasticsearch cluster.
You can verify that by checking the node count and version in the marvel UI.

Advertisements