CloudCover

S3 for Elasticsearch Backups

by Nonbeing, Tuesday, 10 May 2016

ELK FTW

As you probably already know, The “ELK Stack” consists of three services: Elasticsearch, Logstash, and Kibana. Although they all work rather well together, each one is a separate project driven by Elastic, an opensource vendor that has been making waves in the analytics community.

ELK is now popular enough to rival the dominance of Splunk, the log analytics behemoth and historical market leader - especially for Enterprises. Just Elasticsearch is downloaded more than half a million times a month - far more than even the likes of Ubuntu (source: distrowatch) - making it one of the hottest open source projects around.

No wonder we’re routinely seeing articles like Is Open Source Overtaking Splunk and this eye-opening Google Trends Report.

Elasticsearch Backup and Restore

So Elasticsearch (“You know, for search” [sorry, couldn’t resist :P]) is really popular - duh! We think Elastic.co has done a fantastic job at all levels. Personally, I feel compelled to say they’ve absolutely nailed the API: it’s my go-to reference for how a REST API should be defined and implemented.

In particular, we recently consumed the excellent Elasticsearch Snapshot+Restore API to solve a problem for a customer.

How NOT to do it

The customer was running Elasticsearch (ES) clusters on AWS EC2. As part of their production deployments, they needed to push their Elasticsearch data from one cluster (QA) and to another Elasticsearch cluster (Prod) - deliberately clobbering the existing target ES data in the process.

When they came to us, they were doing this by:

  • creating file-system snapshots of their indices on the source cluster
  • scp-ing the snapshots (dozens of GBs) to the target cluster, which could take hours
  • restoring the snapshots from the file-system on the target cluster

Needless to say, this was tedious, slow and difficult to automate. Clearly, there had to be a better way.

S3 To The Rescue

The customer was completely on AWS already, their ES clusters were on AWS already, so the solution was fairly obvious - just use S3 as the ES snapshot repository! In fact, it’s such a no-brainer for these kind of use-cases that Elastic now provides this out-of-the-box. They recently baked this functionality into their API using the superb (and now in-built) AWS Cloud Plugin that lets you set an S3 bucket+path as an ES snapshot repository.

Why S3?

S3 is fantastic for this use-case:

  • Durable backups: S3 has high durability and availability which is important for backups in general.
  • Globally accessible: what if your source and target clusters are in totally different regions? S3 is the perfect global repo for this scenario.
  • Backup size: Each ES snapshot is split into multiple small chunks which never hit the max object size of 5TB on S3. Hence S3 can be used to backup even huge indices.
  • Speed: With filesystem-based snapshots, if not using a shared file system such as NFS, you probably have to send the snapshot files over to the target ES cluster via something like scp or rsync. Even with NFS, this can take a long time if the snapshots are large and if there are several files. With S3 functioning as a global, shared snapshot repository, speed gains can be signifcant.

Da Code

We’re pleased to open-source a simple utility for S3-based backup and restore for Elasticsearch.

It has proved to be valuable for our customers and even for us internally. So even though it’s just a small Python script that directly invokes the ES API, we think you might find it useful, even if only as a starting point for a more complex backup+restore solution:

Github: elasticsearch-s3-backup

Instructions and details are in the README in the Github repo.

If you use ES and AWS, we’d love it if you took our little tool for a spin - or used it in your own code. Let us know - contact us at devopsATcloudcover.in for any feedback/suggestions. Or just report an issue or pull request directly on Github itself.

Cheers!

Author

Nonbeing

Nonbeing

Nonbeing is "Chief DevOps Junkie" at CloudCover. Whether it's making software or breaking software, he loves to "Automate ALL The Things!"