If you’re trying to setup a highly available RabbitMQ cluster using HAProxy, you may encounter a disconnection issue from your clients.
This problem is due to HAProxy having a timeout client (clitimeout is deprecated) setted for the default client timeout parameter. If a connection is considered idle for more than timeout client (ms), the connection is dropped by HAProxy.
RabbitMQ clients use persistent connections to a broker, which never timeout. See the problem here? If your RabbitMQ client is inactive for a period of time, HAProxy will automatically close the connection.
So how do we solve the problem ? I’ve seen that HAProxy got a clitcpka option which enable the sending of TCP keepalive packets on the client side.
Let’s use it !
But it’s not solving the problem, disconnection issue are still there. Damn.
After reading a discuss about RabbitMQ and HAProxy on the RabbitMQ mailing list, Tim Watson pointed out that:
[…]the exact behaviour of tcp keep-alive is determined by the underlying OS/Kernel configuration[…]
On Ubuntu 14.04, in the tcp man, you can see that the default value for the tcp_keepalive_time parameter is set to 2 hours. This parameter defines the time a connection needs to be idle before TCP begins sending out keep-alive packets.
You can also verify it by using the following command:
$ cat /proc/sys/net/ipv4/tcp_keepalive_time 7200
OK ! Let’s raise thetimeout client value in our HAProxy configuration for AQMP, 3 hours should be good. And that’s it ! No more disconnection issues 🙂
Here is a sample HAProxy configuration:
global log 127.0.0.1 local1 maxconn 4096 #chroot /usr/share/haproxy user haproxy group haproxy daemon #debug #quiet defaults log global mode tcp option tcplog retries 3 option redispatch maxconn 2000 timeout connect 5000 timeout client 50000 timeout server 50000 listen stats :1936 mode http stats enable stats hide-version stats realm Haproxy\ Statistics stats uri / listen aqmp_front :5672 mode tcp balance roundrobin timeout client 3h timeout server 3h option clitcpka server aqmp-1 rabbitmq1.domain:5672 check inter 5s rise 2 fall 3 server aqmp-2 rabbitmq2.domain:5672 check inter 5s rise 2 fall 3
Enjoy your highly available RabbitMQ cluster !
I think there may be another solution to this problem by using the heartbeat feature of RabbitMQ, see more about that here: https://www.rabbitmq.com/reliability.html