73

Let's say I have movie data in my ElasticSearch and I created them like this:

curl -XPUT "http://192.168.0.2:9200/movies/movie/1" -d'
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972
}'

And I have a bunch of movies from different years. I want to copy all the movies from a particular year (so, 1972) and copy them to a new index of "70sMovies", but I couldn't see how to do that.

BSMP
  • 4,596
  • 8
  • 33
  • 44
cybergoof
  • 1,407
  • 3
  • 16
  • 25

9 Answers9

158

Since ElasticSearch 2.3 you can now use the built in _reindex API

for example:

POST /_reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}

Or only a specific part by adding a filter/query

POST /_reindex
{
  "source": {
    "index": "twitter",
    "query": {
      "term": {
        "user": "kimchy"
      }
    }
  },
  "dest": {
    "index": "new_twitter"
  }
}

Read more: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Ludo - Off the record
  • 5,153
  • 4
  • 31
  • 23
  • Work perfectly, thanks for sharing (debian/jessie, ES 5.2.2, curl). – Takman Mar 17 '17 at 19:41
  • 5
    Note that it does not copy the mappings and settings (for example if you have total field settings greater than default 1000). In this case you should create a new index with the settings and then run _reindex. See: https://www.elastic.co/guide/en/elasticsearch/reference/5.4/docs-reindex.html#docs-reindex – Kfir Erez Feb 11 '18 at 17:37
  • it won't work if the index is configured with _source as false – Sujal Mandal Nov 29 '18 at 09:00
  • 3
    AFAIK, the destination index has to be pre-configured with *mappings* – YetAnotherBot Feb 26 '19 at 12:01
  • If your model follows a specific naming pattern for these indices, you can use the _template API - https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-templates.html – gravetii Nov 15 '20 at 17:10
55

The best approach would be to use elasticsearch-dump tool https://github.com/taskrabbit/elasticsearch-dump.

The real world example I used :

elasticdump \
  --input=http://localhost:9700/.kibana \
  --output=http://localhost:9700/.kibana_read_only \
  --type=mapping
elasticdump \
  --input=http://localhost:9700/.kibana \
  --output=http://localhost:9700/.kibana_read_only \
  --type=data
MAQ
  • 673
  • 6
  • 11
  • 1
    This elasticsearch-dump is a much better option because it can copy data between different clusters and supports several ES versions. – JonyD Feb 23 '17 at 09:57
5

Check out knapsack: https://github.com/jprante/elasticsearch-knapsack

Once you have the plugin installed and working, you could export part of your index via query. For example:

curl -XPOST 'localhost:9200/test/test/_export' -d '{
"query" : {
    "match" : {
        "myfield" : "myvalue"
    }
},
"fields" : [ "_parent", "_source" ]
}'

This will create a tarball with only your query results, which you can then import into another index.

coffeeaddict
  • 858
  • 5
  • 3
5

To reindex specific type from source index to destination index type syntax is

POST _reindex/
 {
 "source": {
 "index": "source_index",
 "type": "source_type",
 "query": {
  // add filter criteria
   }
 },
 "dest": {
  "index": "dest_index",
  "type": "dest_type"
  }
}
Ramesh Papaganti
  • 7,311
  • 3
  • 31
  • 36
4

If the intent were to copy some portion of the data or the entire data to an index with the same settings/mappings as that of the original index one could use the clone api to achieve the same. Something like below:

POST /<index>/_clone/<target-index>

OR

PUT /<index>/_clone/<target-index>

However if the intent is to copy the data to a new index with the different settings/mappings than the original index one could use the reindex api to achieve the same. Something like below:

POST _reindex/

 {

     "source": {

         "index": "source_index",

         "type": "source_type",

         "query": {

              // add filter criteria

         }

    },

   "dest": {

       "index": "dest_index",

       "type": "dest_type"

   }

}

*Note: In case of reindex api the target index has to be created prior to actual api call.

For further reading on difference between clone and reindex refer What's the difference between cloning and reindexing an index in Elasticsearch?

YDF
  • 365
  • 5
  • 9
3

You can do it easily with elasticsearch-dump (https://github.com/taskrabbit/elasticsearch-dump) in three steps. In the following example I copy the index "thor" to "thor2"

elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=analyzer

elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=mapping

elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=data
jpereira
  • 648
  • 7
  • 12
2

Well the straightforward way to do this is to write code, with the API of your choice, querying for "year": 1972 and then indexing that data into a new index. You would use the Search api or the Scan and Scroll API to get all the documents and then either index them one by one or use the Bulk Api:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html

Assuming you don't want to do this via code but are looking for a direct way of doing this, I suggest the Elasticsearch Snapshot and Restore. Basically you would take a snapshot of your existing index, restore it into a new index and then use the Delete command to delete all documents with a year other than 1972.

Snapshot And Restore

The snapshot and restore module allows to create snapshots of individual indices or an entire cluster into a remote repository. At the time of the initial release only shared file system repository was supported, but now a range of backends are available via officially supported repository plugins.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html

Delete By Query API

The delete by query API allows to delete documents from one or more indices and one or more types based on a query. The query can either be provided using a simple query string as a parameter, or using the Query DSL defined within the request body.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

John Petrone
  • 26,943
  • 6
  • 63
  • 68
2

Since v7.4 the _clone api was introduced and can easily satisfy your need: (read for the relevant prerequisites and monitoring involved)

POST /<index>/_clone/<target-index>

Or:

PUT /<index>/_clone/<target-index>
mork
  • 1,747
  • 21
  • 23
1

You can use elasticdump --searchBody:

# Copy documents from movies to 70sMovies (filtering using query)
elasticdump \
  --input=http://localhost:9200/movies \
  --output=http://localhost:9200/70sMovies \
  --type=data \
  --searchBody="{\"query\":{\"term\":{\"username\": \"admin\"}}}" # <--- Your query here

more on elasticdump options here.

Bilal
  • 2,883
  • 5
  • 37
  • 60