Tips and tricks for LAMP (Linux Apache MySQL PHP) developers.

When I cant find a suitable solution to a problem and have to work it out for myself I'll post the result here so hopefully others will find it useful. (All code is offered in good faith. It may not be the best solution, just a solution).

Monday, 2 September 2013

Elasticsearch, Chef and Vagrant

I've been tasked at work with setting up an Elasticsearch cluster. We use Chef for provisioning and there's an official cookbook available with some instructions but they pressume you are using Amazon EC2 which we are not - we're using our own servers and Vagrant VMs for testing - so I had to figure a few things out myself.

When I first added the recipe to the node's run list it all installed fine but then I found that Elasticsearch was not running. When I tried running it manually it just said "Killed" and exited. This had me scratching my head for quite a while but I finally found the solution.

In some of the official examples they include the following in the Chef node:

"elasticsearch": {
    "bootstrap.mlockall": true

It's not explained what this does but in the template config YAML file it says it prevents the JVM from using swap which causes Elasticsearch to perform badly. Fair enough, however, on a virtual machine that has very little memory it can mean that the JVM doesn't have enough memory to run so it crashes. True is the default value so it's not enough to simply not specify this config, you have to set it to false.

Once I got that working my first node had Elasticsearch running and all was well. Then I started up my second node but I couldn't get it to form a cluster with the first.

As per the documentation I had given them both the same cluster_name. Our servers are spread across different networks so I couldn't use the default multicast option for discovery so I added the FQDN's of each node to the unicast list:

"elasticsearch": {
    "": false,
    "": "[\"\", \"\"]"

Each node has a host entry for each other node and they could telnet to each other on the Elasticsearch discovery port (9300) just fine but when the second node started up I got an error like:

[node2[inet[/]] failed to send join request to master [node1], reason
[org.elasticsearch.transport.RemoteTransportException: [node2[inet[/]][discovery/zen/join]; 
org.elasticsearch.ElasticSearchIllegalStateException: Node [node2[inet[/]] not 
master for join request from [node2[inet[/]]

Huh? Why was node2 trying to connect to node2? It was my colleague that noticed the references to the 10.0.2.* IPs where we would've expected 192.168.33.* IPs. Turns out that Vagrant always sets the NAT adapter on eth0 and it was the IP of that that Elasticsearch was binding to by default. You can override with the config:

"elasticsearch": {
    "": ""

Once I'd done that for each node (with their respective IPs) the cluster started working.


TreeFree said...

Just wanted to say thank you for this, unblocked me.

Also, a subtle tweak to your solution allowed me to continue having elasticsearch bound to all interfaces (which for an integration test, I needed to be able to simply curl 'localhost' as I couldn't get the dynamic ip in at that point).

That tweak was to actually just override the 'publish_host', i.e.:

node.set[:elasticsearch][:network][:publish_host] = node[:ipaddress]

TreeFree said...

Sorry, forgot to also mention that node[:ipaddress] is actually the eth1 / public nic, as supported and overriding with an ohai plugin the usual chef default which is eth0, which of course vagrant uses for it's own internal networking purposes.

Here is a link to that: