How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04

Hey there !

I’ve recently hit the limitations of a one node elasticsearch cluster in my ELK setup, see my previous blog post: Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu

After more researchs, I’ve decided to upgrade the stack architecture and more precisely the elasticsearch cluster and the logstash integration with the cluster.

I’ve been using the following software versions:

  • Elasticsearch 1.4.1
  • Logstash 1.4.2

Setup the Elasticsearch cluster

You’ll need to apply this procedure on each elasticsearch node.

Java

I’ve decided to install the Oracle JDK in replacement of the OpenJDK using the following PPA:

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update && sudo apt-get install oracle-java7-installer

In case you’re missing the add-apt-repository command, make sure you have the package python-software-properties installed:

$ sudo apt-get install python-software-properties

Install via Elasticsearch repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb http://packages.elasticsearch.org/elasticsearch/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update && sudo apt-get install elasticsearch

You can also decide to start the elasticsearch service on boot using the following command:

$ sudo update-rc.d elasticsearch defaults 95 10

Configuration

You’ll need to edit the elasticsearch configuration file in /etc/elasticsearch/elasticsearch.yml and update the following parameters:

  • cluster.name: my-cluster-name

I suggest to update the default cluster name with a defined cluster name. Especially if you want to have another cluster on your network with multicast enabled.

  • index.number_of_replicas: 2

This will ensure a copy of your data on every node of your cluster. Set this property to N-1 where N is the number of nodes in your cluster.

  • gateway.recover_after_nodes: 2

This will ensure the recovery process will start after at least 2 nodes in the cluster have been started.

  • discovery.zen.mininum_master_nodes: 2

Should be set to something like N/2 + 1 where N is the number of nodes in your cluster. This is to avoid the “split-brain” scenario.

See this post for more information on this scenario: http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/

Disabling multicast

Multicast is not recommended in production, disabling it will allow more control over your cluster:

  • discovery.zen.ping.multicast.enabled: false
  • discovery.zen.ping.unicast.hosts: [“host-1”, “host-2”]

Of course, you’ll need to specify the 2 others hosts for each node in your cluster:

  • host-1 will communicate with host-2 & host-3
  • host-2 will communicate with host-1 & host-3
  • host-3 will communicate with host-1 & host-2

Cluster overview via Marvel

It’s free for development use ! See marvel’s homepage for more info.

Install it:

$ /usr/share/elasticsearch/bin/plugin -i elasticsearch/marvel/latest

Restart the elasticsearch service:

$ sudo service elasticsearch start

Now you can access the marvel UI via your browser on any of your elasticsearch nodes.
For example the first node: http://elasticsearch-host-a:9200/_plugin/marvel

Automatic index cleaning via Curator

This tool can be installed on node only.

You can use the curator program to delete indexes. See more information in the github repository: https://github.com/elasticsearch/curator

You’ll need pip in order to install curator:

$ sudo apt-get install python-pip

Once it’s done, you can install curator:

$ sudo pip install elasticsearch-curator

Now, it’s easy to setup a cron to delete the indexes older than 30 days in /etc/cron.d/elasticsearch_curator:

@midnight     root        curator delete --older-than 30 >> /var/log/curator.log 2>&1

Setup the Logstash node

Java

Logstash is using Java, you need to ensure you’ve got a JDK installed on your system. Use either OpenJDK or Oracle JDK.

Install via repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb http://packages.elasticsearch.org/logstash/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update && sudo apt-get install logstash

Generate a SSL certificate

Use the following command to generate a SSL certificate request and private key in /etc/ssl:

$ openssl req -x509 -newkey rsa:2048 -keyout /etc/ssl/logstash.key -out /etc/ssl/logstash.pub -nodes -days 1095

Configuration

I’ll skip the configuration for inputs, filter and specify only the output configuration for the communication with the elasticsearch cluster.

We’re not going to specify a elasticsearch host anymore, instead we will specify that this logstash instance needs to communicate with the cluster.

/etc/logstash/conf.d/10_output.conf

output {
       elasticsearch { }
}

Then we’ll edit the init script of logstash in /etc/init/logstash.conf and update the JAVA_OPTS var with:

LS_JAVA_OPTS="-Djava.io.tmpdir=${LS_HOME} -Des.config=/etc/logstash/elasticsearch.yml"

And create the file /etc/logstash/elasticsearch.yml with the following content:

cluster.name: my-cluster-name
node.name: logstash-indexer-01

If you’ve disabled multicast, then you’ll need to add the following line:

discovery.zen.ping.unicast.hosts: "elasticsearch-node-1", "elasticsearch-node-2", "elasticsearch-node-3"]

And start logstash:

$ sudo service logstash start

The logstash node will automatically be added to your elasticsearch cluster.
You can verify that by checking the node count and version in the marvel UI.

14 thoughts on “How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04

  1. Hi
    I am trying to achieve logstash output to elasticsearch cluster and Followed the process mentioned here . However, when I place the elasticsearch.yml under /etc/logstash/conf.d (I am on Cent OS) , It throws error stating expecting an Input, filter, Output.
    Please suggest as this is important and I need to get it done asap .
    Appreciate your help and response .

  2. Hi,

    I am trying to achieve the elk cluster with auto discovery method.
    I am using nirmata for auto discovery of elk where nirmata maintains the dockers.
    For the cluster environment i am using ClusterK.ELK should be connected to Couchbase for getting the Couchbase data.
    If the dockers are running in a cluster and one docker dies and the other docker comes up.
    Will it have all the data from Couchbase or only the present data that it will pull.
    How can i make the sync between clutser’s happen.
    If you can give your suggestions on the above.

    Regards,
    Vijay

    1. Hello Vijay, I’ve nevedr used any of the technos you’re speaking about (neither nirmata or ClusterK). The only suggestion I can give is try it out 🙂

      1. Hi deviantony,

        Sure i am trying the same, can you give your inputs on having a sync between ELK cluster.
        When the new vm joins the cluster will it have all the data as the three clusters will be having or it will have new data. only.

        Regards,
        Vijay

  3. By default an ELK cluster will sync the data between its hosts.

    By altering the index.number_of_replicas parameter you can ensure the number of “copies” that will be available in the cluster.
    In order to have a fault-tolerant cluster you’ll need to tune that parameter.

    When a new Elasticsearch host joins the cluster, it will start to replicate the data you have in your cluster (depending on the data you have, it can be slow).

  4. Thank you for this helpful description
    i m working to collect logs from 3000 server
    I installed 2 servers with ElasticSearch and logstash indexer
    2 with redis , logstash shipper ans elasticsearch
    So i have 4 instance for ElasticSearch
    I tried to find example of cluster configuraton

    node.master: true
    discovery.zen.minimum_master_nodes: 3
    node.data: true
    discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]

    node.master: true
    discovery.zen.minimum_master_nodes: 3
    node.data: true
    discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]

    node.master: true
    discovery.zen.minimum_master_nodes: 3
    node.data: true
    discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]

    node.master: false
    discovery.zen.minimum_master_nodes: 3
    node.data: true
    discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]

    its correct ?

    1. Hello,
      About the discovery.zen.ping.unicast.hosts, you don’t have to specify every host. For instance, es01 should only know about es02, es03 and es04. And for the node.master option, I’ve never used it.

  5. You made my day 🙂

    I have a setup with logstash and kibana with VRRP on 2 machines and ES on 2 other machines. Using this method starts “ES” and binds it to the correct ip. Before when starting ES seperate and logstash afterwards made logstash bind to any address and port 9301 which made it use the wrong source ip communicating with the local ES.

    ———————————————————–
    /etc/elasticsearch/elasticsearch.yml
    cluster.name: elastic-stage
    node.name: “qasyslog02a”
    node.master: false
    node.data: false

    network.host: 10.136.135.16

    discovery.zen.ping.multicast.enabled: false
    discovery.zen.ping.unicast.hosts: [ “qaelastic01a.sfv.se”, “qaelastic01b.sfv.se” ]

    ———————————————————–
    logstash

    input {
    udp {
    port => 5140
    type => syslog
    host => “10.136.135.129”
    }
    }

    output {
    elasticsearch {
    template => “/etc/logstash/sfv-template.json”
    template_overwrite => true
    manage_template => true
    }
    }
    }

  6. >> gateway.recover_after_nodes: 2

    Looks that you could set this parameter to 1 as far as you have one replica per node.

Leave a reply to Yurii Cancel reply