Dump all documents of Elasticsearch

Question

Is there any way to create a dump file that contains all the data of an index among with its settings and mappings?

A Similar way as mongoDB does with mongodump
or as in Solr its data folder is copied to a backup location.

Cheers!

Evan · Answer 1 · 2014-01-06T22:48:04.987

65

Here's a new tool we've been working on for exactly this purpose https://github.com/taskrabbit/elasticsearch-dump. You can export indices into/out of JSON files, or from one cluster to another.

edited Jan 06 '14 at 22:48

answered Dec 24 '13 at 22:44

Evan

3,191
4
29
25

2

I tried this. While the index data was 112Mb. Exported data was 3 times around 337MB in json format and it took 40 minutes. I don't know it the index is too large, is it really practical looking at the time it takes. What is its benchmark against the snapshot feature provided by ES out of box. – Sanjeev Kumar Dangi May 19 '15 at 13:10
1

@Evan does elasticdump uses scan and scroll feature of ElasticSearch internally? – Sanjeev Kumar Dangi May 19 '15 at 13:15
you can try setting --limit=10000 – Arvin Oct 14 '21 at 12:43

Andreas Neumann · Answer 2 · 2022-01-24T09:12:58.817

32

Elasticsearch supports a snapshot function out of the box:

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

edited Jan 24 '22 at 09:12

answered May 09 '14 at 09:15

Andreas Neumann

10,734
1
32
52

61

While very handy, the snapshot don't really export your data in a useable format (json, csv, etc) – Evan Nov 18 '14 at 03:24
@Andreas Neumann Can snapshot feature can take a backup of subset of documents in a index. Suppose an index has 1000 docs and i want to backup only 500 documents. I would like to do this because I want to import a set of documents and restore in some other index to do some load testing. I don't require the all docs for my task and index is huge and I don't want to take the snapshot of all the data as it will take a lot of time – Sanjeev Kumar Dangi May 19 '15 at 13:12
18

Really don't understand why this is the accepted answer, given that it is nothing to do with the original question (which asked for a way to dump ES documents as json, not for a way to back them up using snapshots). – Webreaper May 12 '20 at 21:24

score 18 · Answer 3 · answered Apr 17 '20 at 09:08

We can use elasticdump to take the backup and restore it, We can move data from one server/cluster to another server/cluster.

1. Commands to move one index data from one server/cluster to another using elasticdump.

# Copy an index from production to staging with analyzer and mapping:
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=analyzer
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=mapping
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=data

2. Commands to move all indices data from one server/cluster to another using multielasticdump.

Backup

multielasticdump \
  --direction=dump \
  --match='^.*$' \
  --limit=10000 \
  --input=http://production.es.com:9200 \
  --output=/tmp

Restore

multielasticdump \
  --direction=load \
  --match='^.*$' \
  --limit=10000 \
  --input=/tmp \
  --output=http://staging.es.com:9200

Note:

If the --direction is dump, which is the default, --input MUST be a URL for the base location of an ElasticSearch server (i.e. http://localhost:9200) and --output MUST be a directory. Each index that does match will have a data, mapping, and analyzer file created.
For loading files that you have dumped from multi-elasticsearch, --direction should be set to load, --input MUST be a directory of a multielasticsearch dump and --output MUST be a Elasticsearch server URL.
The 2nd command will take a backup of settings, mappings, template and data itself as JSON files.
The --limit should not be more than 10000 otherwise, it will give an exception.
Get more details here.

You may need to add `--ignoreChildError=true` to the `multielasticdump` commands. Also, setting the `--limit` lower than 10000 (for example, to 1000) may be necessary to prevent some other types of obscure errors. The latter applies to `elasticdump` command as well. — Roland Pihlakas, Jan 12 '21 at 20:14
For `multielasticdump --direction=load` I had to add `--ignoreType='template'`. — Roland Pihlakas, Jan 12 '21 at 20:25
note that you need a trailing slash after the URL, e.g. `multielasticdump ... --input=http://production.es.com:9200/ ...` instead of just `multielasticdump ... --input=http://production.es.com:9200 ...`. It fails with some very non-descriptive error otherwise. — blubb, Jan 22 '21 at 13:33

score 15 · Answer 4 · answered Nov 14 '20 at 05:24

For your case Elasticdump is the perfect answer.
First, you need to download the mapping and then the index

# Install the elasticdump 
npm install elasticdump -g

# Dump the mapping 
elasticdump --input=http://<your_es_server_ip>:9200/index --output=es_mapping.json --type=mapping

# Dump the data
elasticdump --input=http://<your_es_server_ip>:9200/index --output=es_index.json --type=data

If you want to dump the data on any server I advise you to install esdump through docker. You can get more info from this website Blog Link

Rohini Choudhary · Answer 5 · 2017-02-11T14:52:33.087

ElasticSearch itself provides a way to create data backup and restoration. The simple command to do it is:

CURL -XPUT 'localhost:9200/_snapshot/<backup_folder name>/<backupname>' -d '{
    "indices": "<index_name>",
    "ignore_unavailable": true,
    "include_global_state": false
}'

Now, how to create, this folder, how to include this folder path in ElasticSearch configuration, so that it will be available for ElasticSearch, restoration method, is well explained here. To see its practical demo surf here.

score 2 · Answer 6 · answered Oct 09 '13 at 10:51

2

The data itself is one or more lucene indices, since you can have multiple shards. What you also need to backup is the cluster state, which contains all sorts of information regarding the cluster, the available indices, their mappings, the shards they are composed of etc.

It's all within the data directory though, you can just copy it. Its structure is pretty intuitive. Right before copying it's better to disable automatic flush (in order to backup a consistent view of the index and avoiding writes on it while copying files), issue a manual flush, disable allocation as well. Remember to copy the directory from all nodes.

Also, next major version of elasticsearch is going to provide a new snapshot/restore api that will allow you to perform incremental snapshots and restore them too via api. Here is the related github issue: https://github.com/elasticsearch/elasticsearch/issues/3826.

answered Oct 09 '13 at 10:51

javanna

59,145
14
144
125

1

In fact I've tried this solution and it didn't work for me. I copy the data folder as you said to the new remote installation, I tried to start ES but it died due to exceptions. Unfortunately I don't have the log to paste it. Did your solution succeed? – Evan P Oct 09 '13 at 12:00
Yes that's how you should do it. Did you try to restore the index to the same elasticsearch version? Similar machine? Was it a big index? Single node? – javanna Oct 09 '13 at 12:19
It was a single node, 5 shards, a small index (~2k documents) but different machine. The version was the same (0.9) – Evan P Oct 10 '13 at 07:58

score 2 · Answer 7 · answered Mar 28 '19 at 13:14

You can also dump elasticsearch data in JSON format by http request: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
CURL -XPOST 'https://ES/INDEX/_search?scroll=10m'
CURL -XPOST 'https://ES/_search/scroll' -d '{"scroll": "10m", "scroll_id": "ID"}'

score 2 · Accepted Answer · answered Dec 05 '21 at 16:00

2

At the time of writing this answer(2021), the official way of backing up an ElasticSearch cluster is to snapshot it. Refer to: https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html

answered Dec 05 '21 at 16:00

Binita Bharati

5,239
1
43
24

1

Thank you for this! I've updated the question with your answer as the right one, since it's the latest right way to do this. – Evan P Jan 12 '22 at 15:34

score 1 · Answer 9 · answered Mar 11 '20 at 09:02

To export all documents from ElasticSearch into JSON, you can use the esbackupexporter tool. It works with index snapshots. It takes the container with snapshots (S3, Azure blob or file directory) as the input and outputs one or several zipped JSON files per index per day. It is quite handy when exporting your historical snapshots. To export your hot index data, you may need to make the snapshot first (see the answers above).

score 0 · Answer 10 · answered Sep 16 '21 at 14:46

If you want to massage the data on its way out of Elasticsearch, you might want to use Logstash. It has a handy Elasticsearch Input Plugin.

And then you can export to anything, from a CSV file to reindexing the data on another Elasticsearch cluster. Though for the latter you also have the Elasticsearch's own Reindex.

Dump all documents of Elasticsearch

10 Answers10

Linked