How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04

Hey there !

I’ve recently hit the limitations of a one node elasticsearch cluster in my ELK setup, see my previous blog post: Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu

After more researchs, I’ve decided to upgrade the stack architecture and more precisely the elasticsearch cluster and the logstash integration with the cluster.

I’ve been using the following software versions:

Elasticsearch 1.4.1
Logstash 1.4.2

Setup the Elasticsearch cluster

You’ll need to apply this procedure on each elasticsearch node.

Java

I’ve decided to install the Oracle JDK in replacement of the OpenJDK using the following PPA:

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update && sudo apt-get install oracle-java7-installer

In case you’re missing the add-apt-repository command, make sure you have the package python-software-properties installed:

$ sudo apt-get install python-software-properties

Install via Elasticsearch repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb http://packages.elasticsearch.org/elasticsearch/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update && sudo apt-get install elasticsearch

You can also decide to start the elasticsearch service on boot using the following command:

$ sudo update-rc.d elasticsearch defaults 95 10

Configuration

You’ll need to edit the elasticsearch configuration file in /etc/elasticsearch/elasticsearch.yml and update the following parameters:

cluster.name: my-cluster-name

I suggest to update the default cluster name with a defined cluster name. Especially if you want to have another cluster on your network with multicast enabled.

index.number_of_replicas: 2

This will ensure a copy of your data on every node of your cluster. Set this property to N-1 where N is the number of nodes in your cluster.

gateway.recover_after_nodes: 2

This will ensure the recovery process will start after at least 2 nodes in the cluster have been started.

discovery.zen.mininum_master_nodes: 2

Should be set to something like N/2 + 1 where N is the number of nodes in your cluster. This is to avoid the “split-brain” scenario.

See this post for more information on this scenario: http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/

Disabling multicast

Multicast is not recommended in production, disabling it will allow more control over your cluster:

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [“host-1”, “host-2”]

Of course, you’ll need to specify the 2 others hosts for each node in your cluster:

host-1 will communicate with host-2 & host-3
host-2 will communicate with host-1 & host-3
host-3 will communicate with host-1 & host-2

Cluster overview via Marvel

It’s free for development use ! See marvel’s homepage for more info.

Install it:

$ /usr/share/elasticsearch/bin/plugin -i elasticsearch/marvel/latest

Restart the elasticsearch service:

$ sudo service elasticsearch start

Now you can access the marvel UI via your browser on any of your elasticsearch nodes.
For example the first node: http://elasticsearch-host-a:9200/_plugin/marvel

Automatic index cleaning via Curator

This tool can be installed on node only.

You can use the curator program to delete indexes. See more information in the github repository: https://github.com/elasticsearch/curator

You’ll need pip in order to install curator:

$ sudo apt-get install python-pip

Once it’s done, you can install curator:

$ sudo pip install elasticsearch-curator

Now, it’s easy to setup a cron to delete the indexes older than 30 days in /etc/cron.d/elasticsearch_curator:

@midnight     root        curator delete --older-than 30 >> /var/log/curator.log 2>&1

Setup the Logstash node

Java

Logstash is using Java, you need to ensure you’ve got a JDK installed on your system. Use either OpenJDK or Oracle JDK.

Install via repository

$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb http://packages.elasticsearch.org/logstash/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
$ sudo apt-get update && sudo apt-get install logstash

Generate a SSL certificate

Use the following command to generate a SSL certificate request and private key in /etc/ssl:

$ openssl req -x509 -newkey rsa:2048 -keyout /etc/ssl/logstash.key -out /etc/ssl/logstash.pub -nodes -days 1095

Configuration

I’ll skip the configuration for inputs, filter and specify only the output configuration for the communication with the elasticsearch cluster.

We’re not going to specify a elasticsearch host anymore, instead we will specify that this logstash instance needs to communicate with the cluster.

/etc/logstash/conf.d/10_output.conf

output {
       elasticsearch { }
}

Then we’ll edit the init script of logstash in /etc/init/logstash.conf and update the JAVA_OPTS var with:

LS_JAVA_OPTS="-Djava.io.tmpdir=${LS_HOME} -Des.config=/etc/logstash/elasticsearch.yml"

And create the file /etc/logstash/elasticsearch.yml with the following content:

cluster.name: my-cluster-name
node.name: logstash-indexer-01

If you’ve disabled multicast, then you’ll need to add the following line:

discovery.zen.ping.unicast.hosts: "elasticsearch-node-1", "elasticsearch-node-2", "elasticsearch-node-3"]

And start logstash:

$ sudo service logstash start

The logstash node will automatically be added to your elasticsearch cluster.
You can verify that by checking the node count and version in the marvel UI.

14 thoughts on “How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04”

Pingback: Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu | deviantony

Hi
I am trying to achieve logstash output to elasticsearch cluster and Followed the process mentioned here . However, when I place the elasticsearch.yml under /etc/logstash/conf.d (I am on Cent OS) , It throws error stating expecting an Input, filter, Output.
Please suggest as this is important and I need to get it done asap .
Appreciate your help and response .

deviantony says:

7 March 2015 at 909 15

Hello Jay,

I’ll skip the configuration for inputs, filter and specify only the output configuration for the communication with the elasticsearch cluster.

There is no input or filter definition specified in this article, check this article for more info (the logstash setup section): https://deviantony.wordpress.com/2014/05/19/centralized-logging-with-an-elk-stack-elasticsearch-logback-kibana/

Reply

Hi,

I am trying to achieve the elk cluster with auto discovery method.
I am using nirmata for auto discovery of elk where nirmata maintains the dockers.
For the cluster environment i am using ClusterK.ELK should be connected to Couchbase for getting the Couchbase data.
If the dockers are running in a cluster and one docker dies and the other docker comes up.
Will it have all the data from Couchbase or only the present data that it will pull.
How can i make the sync between clutser’s happen.
If you can give your suggestions on the above.

Regards,
Vijay

deviantony says:

24 April 2015 at 707 37

Hello Vijay, I’ve nevedr used any of the technos you’re speaking about (neither nirmata or ClusterK). The only suggestion I can give is try it out 🙂

Reply
1. Vijay Kumar says:
  
  24 April 2015 at 808 08
  
  Hi deviantony,
  
  Sure i am trying the same, can you give your inputs on having a sync between ELK cluster.
  When the new vm joins the cluster will it have all the data as the three clusters will be having or it will have new data. only.
  
  Regards,
  Vijay

By default an ELK cluster will sync the data between its hosts.

By altering the index.number_of_replicas parameter you can ensure the number of “copies” that will be available in the cluster.
In order to have a fault-tolerant cluster you’ll need to tune that parameter.

When a new Elasticsearch host joins the cluster, it will start to replicate the data you have in your cluster (depending on the data you have, it can be slow).

Thank you for this helpful description
i m working to collect logs from 3000 server
I installed 2 servers with ElasticSearch and logstash indexer
2 with redis , logstash shipper ans elasticsearch
So i have 4 instance for ElasticSearch
I tried to find example of cluster configuraton

node.master: true
discovery.zen.minimum_master_nodes: 3
node.data: true
discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]

node.master: false
discovery.zen.minimum_master_nodes: 3
node.data: true
discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]

its correct ?

deviantony says:

11 June 2015 at 808 05

Hello,
About the discovery.zen.ping.unicast.hosts, you don’t have to specify every host. For instance, es01 should only know about es02, es03 and es04. And for the node.master option, I’ve never used it.

Reply

You made my day 🙂

I have a setup with logstash and kibana with VRRP on 2 machines and ES on 2 other machines. Using this method starts “ES” and binds it to the correct ip. Before when starting ES seperate and logstash afterwards made logstash bind to any address and port 9301 which made it use the wrong source ip communicating with the local ES.

———————————————————–
/etc/elasticsearch/elasticsearch.yml
cluster.name: elastic-stage
node.name: “qasyslog02a”
node.master: false
node.data: false

network.host: 10.136.135.16

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ “qaelastic01a.sfv.se”, “qaelastic01b.sfv.se” ]

———————————————————–
logstash

input {
udp {
port => 5140
type => syslog
host => “10.136.135.129”
}
}

output {
elasticsearch {
template => “/etc/logstash/sfv-template.json”
template_overwrite => true
manage_template => true
}
}
}

Pingback: Installing ELK 2.0 on Ubuntu (ElasticSearch, Logstash and Kibana) | Projx

>> gateway.recover_after_nodes: 2

Looks that you could set this parameter to 1 as far as you have one replica per node.

Also you made mistake:

discovery.zen.mininum_master_nodes: 2

should be minimum

Pingback: ELK 관련 링크 | Ordinary World

Pingback: Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu | deviantony
Jay says:

6 March 2015 at 2008 58

Hi
I am trying to achieve logstash output to elasticsearch cluster and Followed the process mentioned here . However, when I place the elasticsearch.yml under /etc/logstash/conf.d (I am on Cent OS) , It throws error stating expecting an Input, filter, Output.
Please suggest as this is important and I need to get it done asap .
Appreciate your help and response .

1. deviantony says:
  
  7 March 2015 at 909 15
  
  Hello Jay,
  
  I’ll skip the configuration for inputs, filter and specify only the output configuration for the communication with the elasticsearch cluster.
  
  There is no input or filter definition specified in this article, check this article for more info (the logstash setup section): https://deviantony.wordpress.com/2014/05/19/centralized-logging-with-an-elk-stack-elasticsearch-logback-kibana/
  
Vijay Kumar says:

24 April 2015 at 707 32

Hi,

I am trying to achieve the elk cluster with auto discovery method.
I am using nirmata for auto discovery of elk where nirmata maintains the dockers.
For the cluster environment i am using ClusterK.ELK should be connected to Couchbase for getting the Couchbase data.
If the dockers are running in a cluster and one docker dies and the other docker comes up.
Will it have all the data from Couchbase or only the present data that it will pull.
How can i make the sync between clutser’s happen.
If you can give your suggestions on the above.

Regards,
Vijay

1. deviantony says:
  
  24 April 2015 at 707 37
  
  Hello Vijay, I’ve nevedr used any of the technos you’re speaking about (neither nirmata or ClusterK). The only suggestion I can give is try it out 🙂
  
  1. Vijay Kumar says:
    
    24 April 2015 at 808 08
    
    Hi deviantony,
    
    Sure i am trying the same, can you give your inputs on having a sync between ELK cluster.
    When the new vm joins the cluster will it have all the data as the three clusters will be having or it will have new data. only.
    
    Regards,
    Vijay
deviantony says:

24 April 2015 at 808 30

By default an ELK cluster will sync the data between its hosts.

By altering the index.number_of_replicas parameter you can ensure the number of “copies” that will be available in the cluster.
In order to have a fault-tolerant cluster you’ll need to tune that parameter.

When a new Elasticsearch host joins the cluster, it will start to replicate the data you have in your cluster (depending on the data you have, it can be slow).

stef says:

8 June 2015 at 1503 26

Thank you for this helpful description
i m working to collect logs from 3000 server
I installed 2 servers with ElasticSearch and logstash indexer
2 with redis , logstash shipper ans elasticsearch
So i have 4 instance for ElasticSearch
I tried to find example of cluster configuraton

node.master: true
discovery.zen.minimum_master_nodes: 3
node.data: true
discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]

node.master: true
discovery.zen.minimum_master_nodes: 3
node.data: true
discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]

node.master: true
discovery.zen.minimum_master_nodes: 3
node.data: true
discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]

node.master: false
discovery.zen.minimum_master_nodes: 3
node.data: true
discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]

its correct ?

1. deviantony says:
  
  11 June 2015 at 808 05
  
  Hello,
  About the discovery.zen.ping.unicast.hosts, you don’t have to specify every host. For instance, es01 should only know about es02, es03 and es04. And for the node.master option, I’ve never used it.
  
Johan says:

24 September 2015 at 1301 03

You made my day 🙂

I have a setup with logstash and kibana with VRRP on 2 machines and ES on 2 other machines. Using this method starts “ES” and binds it to the correct ip. Before when starting ES seperate and logstash afterwards made logstash bind to any address and port 9301 which made it use the wrong source ip communicating with the local ES.

———————————————————–
/etc/elasticsearch/elasticsearch.yml
cluster.name: elastic-stage
node.name: “qasyslog02a”
node.master: false
node.data: false

network.host: 10.136.135.16

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ “qaelastic01a.sfv.se”, “qaelastic01b.sfv.se” ]

———————————————————–
logstash

input {
udp {
port => 5140
type => syslog
host => “10.136.135.129”
}
}

output {
elasticsearch {
template => “/etc/logstash/sfv-template.json”
template_overwrite => true
manage_template => true
}
}
}

Pingback: Installing ELK 2.0 on Ubuntu (ElasticSearch, Logstash and Kibana) | Projx
Yurii says:

3 January 2016 at 012 22

>> gateway.recover_after_nodes: 2

Looks that you could set this parameter to 1 as far as you have one replica per node.

Yurii says:

3 January 2016 at 012 26

Also you made mistake:

discovery.zen.mininum_master_nodes: 2

should be minimum

Pingback: ELK 관련 링크 | Ordinary World

Tony's press

Food for IT thought

How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04

Setup the Elasticsearch cluster

Java

Install via Elasticsearch repository

Configuration

Disabling multicast

Cluster overview via Marvel

Automatic index cleaning via Curator

Setup the Logstash node

Java

Install via repository

Generate a SSL certificate

Configuration

14 thoughts on “How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04”

Leave a reply to Yurii Cancel reply

Setup the Elasticsearch cluster

Java

Install via Elasticsearch repository

Configuration

Disabling multicast

Cluster overview via Marvel

Automatic index cleaning via Curator

Setup the Logstash node

Java

Install via repository

Generate a SSL certificate

Configuration

Share this:

Related

14 thoughts on “How to setup an Elasticsearch cluster with Logstash on Ubuntu 12.04”

Leave a reply to Yurii Cancel reply