Hey there !
I’ve recently hit the limitations of a one node elasticsearch cluster in my ELK setup, see my previous blog post: Centralized logging with an ELK stack (Elasticsearch-Logback-Kibana) on Ubuntu
After more researchs, I’ve decided to upgrade the stack architecture and more precisely the elasticsearch cluster and the logstash integration with the cluster.
I’ve been using the following software versions:
- Elasticsearch 1.4.1
- Logstash 1.4.2
Setup the Elasticsearch cluster
You’ll need to apply this procedure on each elasticsearch node.
Java
I’ve decided to install the Oracle JDK in replacement of the OpenJDK using the following PPA:
$ sudo add-apt-repository ppa:webupd8team/java $ sudo apt-get update && sudo apt-get install oracle-java7-installer
In case you’re missing the add-apt-repository command, make sure you have the package python-software-properties installed:
$ sudo apt-get install python-software-properties
Install via Elasticsearch repository
$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add - $ echo "deb http://packages.elasticsearch.org/elasticsearch/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list $ sudo apt-get update && sudo apt-get install elasticsearch
You can also decide to start the elasticsearch service on boot using the following command:
$ sudo update-rc.d elasticsearch defaults 95 10
Configuration
You’ll need to edit the elasticsearch configuration file in /etc/elasticsearch/elasticsearch.yml and update the following parameters:
- cluster.name: my-cluster-name
I suggest to update the default cluster name with a defined cluster name. Especially if you want to have another cluster on your network with multicast enabled.
- index.number_of_replicas: 2
This will ensure a copy of your data on every node of your cluster. Set this property to N-1 where N is the number of nodes in your cluster.
- gateway.recover_after_nodes: 2
This will ensure the recovery process will start after at least 2 nodes in the cluster have been started.
- discovery.zen.mininum_master_nodes: 2
Should be set to something like N/2 + 1 where N is the number of nodes in your cluster. This is to avoid the “split-brain” scenario.
See this post for more information on this scenario: http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/
Disabling multicast
Multicast is not recommended in production, disabling it will allow more control over your cluster:
- discovery.zen.ping.multicast.enabled: false
- discovery.zen.ping.unicast.hosts: [“host-1”, “host-2”]
Of course, you’ll need to specify the 2 others hosts for each node in your cluster:
- host-1 will communicate with host-2 & host-3
- host-2 will communicate with host-1 & host-3
- host-3 will communicate with host-1 & host-2
Cluster overview via Marvel
It’s free for development use ! See marvel’s homepage for more info.
Install it:
$ /usr/share/elasticsearch/bin/plugin -i elasticsearch/marvel/latest
Restart the elasticsearch service:
$ sudo service elasticsearch start
Now you can access the marvel UI via your browser on any of your elasticsearch nodes.
For example the first node: http://elasticsearch-host-a:9200/_plugin/marvel
Automatic index cleaning via Curator
This tool can be installed on node only.
You can use the curator program to delete indexes. See more information in the github repository: https://github.com/elasticsearch/curator
You’ll need pip in order to install curator:
$ sudo apt-get install python-pip
Once it’s done, you can install curator:
$ sudo pip install elasticsearch-curator
Now, it’s easy to setup a cron to delete the indexes older than 30 days in /etc/cron.d/elasticsearch_curator:
@midnight root curator delete --older-than 30 >> /var/log/curator.log 2>&1
Setup the Logstash node
Java
Logstash is using Java, you need to ensure you’ve got a JDK installed on your system. Use either OpenJDK or Oracle JDK.
Install via repository
$ wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add - $ echo "deb http://packages.elasticsearch.org/logstash/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list $ sudo apt-get update && sudo apt-get install logstash
Generate a SSL certificate
Use the following command to generate a SSL certificate request and private key in /etc/ssl:
$ openssl req -x509 -newkey rsa:2048 -keyout /etc/ssl/logstash.key -out /etc/ssl/logstash.pub -nodes -days 1095
Configuration
I’ll skip the configuration for inputs, filter and specify only the output configuration for the communication with the elasticsearch cluster.
We’re not going to specify a elasticsearch host anymore, instead we will specify that this logstash instance needs to communicate with the cluster.
/etc/logstash/conf.d/10_output.conf
output { elasticsearch { } }
Then we’ll edit the init script of logstash in /etc/init/logstash.conf and update the JAVA_OPTS var with:
LS_JAVA_OPTS="-Djava.io.tmpdir=${LS_HOME} -Des.config=/etc/logstash/elasticsearch.yml"
And create the file /etc/logstash/elasticsearch.yml with the following content:
cluster.name: my-cluster-name node.name: logstash-indexer-01
If you’ve disabled multicast, then you’ll need to add the following line:
discovery.zen.ping.unicast.hosts: "elasticsearch-node-1", "elasticsearch-node-2", "elasticsearch-node-3"]
And start logstash:
$ sudo service logstash start
The logstash node will automatically be added to your elasticsearch cluster.
You can verify that by checking the node count and version in the marvel UI.
Hi
I am trying to achieve logstash output to elasticsearch cluster and Followed the process mentioned here . However, when I place the elasticsearch.yml under /etc/logstash/conf.d (I am on Cent OS) , It throws error stating expecting an Input, filter, Output.
Please suggest as this is important and I need to get it done asap .
Appreciate your help and response .
Hello Jay,
There is no input or filter definition specified in this article, check this article for more info (the logstash setup section): https://deviantony.wordpress.com/2014/05/19/centralized-logging-with-an-elk-stack-elasticsearch-logback-kibana/
Hi,
I am trying to achieve the elk cluster with auto discovery method.
I am using nirmata for auto discovery of elk where nirmata maintains the dockers.
For the cluster environment i am using ClusterK.ELK should be connected to Couchbase for getting the Couchbase data.
If the dockers are running in a cluster and one docker dies and the other docker comes up.
Will it have all the data from Couchbase or only the present data that it will pull.
How can i make the sync between clutser’s happen.
If you can give your suggestions on the above.
Regards,
Vijay
Hello Vijay, I’ve nevedr used any of the technos you’re speaking about (neither nirmata or ClusterK). The only suggestion I can give is try it out 🙂
Hi deviantony,
Sure i am trying the same, can you give your inputs on having a sync between ELK cluster.
When the new vm joins the cluster will it have all the data as the three clusters will be having or it will have new data. only.
Regards,
Vijay
By default an ELK cluster will sync the data between its hosts.
By altering the index.number_of_replicas parameter you can ensure the number of “copies” that will be available in the cluster.
In order to have a fault-tolerant cluster you’ll need to tune that parameter.
When a new Elasticsearch host joins the cluster, it will start to replicate the data you have in your cluster (depending on the data you have, it can be slow).
Thank you for this helpful description
i m working to collect logs from 3000 server
I installed 2 servers with ElasticSearch and logstash indexer
2 with redis , logstash shipper ans elasticsearch
So i have 4 instance for ElasticSearch
I tried to find example of cluster configuraton
node.master: true
discovery.zen.minimum_master_nodes: 3
node.data: true
discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]
node.master: true
discovery.zen.minimum_master_nodes: 3
node.data: true
discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]
node.master: true
discovery.zen.minimum_master_nodes: 3
node.data: true
discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]
node.master: false
discovery.zen.minimum_master_nodes: 3
node.data: true
discovery.zen.ping.unicast.hosts: [“es01″, “es02″, “es03″,”es04″]
its correct ?
Hello,
About the discovery.zen.ping.unicast.hosts, you don’t have to specify every host. For instance, es01 should only know about es02, es03 and es04. And for the node.master option, I’ve never used it.
You made my day 🙂
I have a setup with logstash and kibana with VRRP on 2 machines and ES on 2 other machines. Using this method starts “ES” and binds it to the correct ip. Before when starting ES seperate and logstash afterwards made logstash bind to any address and port 9301 which made it use the wrong source ip communicating with the local ES.
———————————————————–
/etc/elasticsearch/elasticsearch.yml
cluster.name: elastic-stage
node.name: “qasyslog02a”
node.master: false
node.data: false
network.host: 10.136.135.16
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ “qaelastic01a.sfv.se”, “qaelastic01b.sfv.se” ]
———————————————————–
logstash
input {
udp {
port => 5140
type => syslog
host => “10.136.135.129”
}
}
output {
elasticsearch {
template => “/etc/logstash/sfv-template.json”
template_overwrite => true
manage_template => true
}
}
}
>> gateway.recover_after_nodes: 2
Looks that you could set this parameter to 1 as far as you have one replica per node.
Also you made mistake:
discovery.zen.mininum_master_nodes: 2
should be minimum