6

I'm planning to use an Elastic Search index to store an huge city database with ~2.9 milion records, and use it as search engine at my Laravel Application.

The question is: I both have the cities at MySQL database and at CSV File. The file have ~300MB.

How can I import it to a index fastest?

Elias Soares
  • 9,884
  • 4
  • 29
  • 59
  • Almost a duplicate, although [this one is about *re*-populating an index](http://stackoverflow.com/questions/21716002/importing-and-updating-data-in-elasticsearch). Similar question and similar scale of file size, though. – GolezTrol May 26 '15 at 22:53
  • @GolezTrol this question don't have a clear answer. I'm not able to extract a solution to my problem from it... :/ – Elias Soares May 26 '15 at 23:16
  • 1
    https://kevinkirsche.com/2014/08/25/using-logstash-to-import-csv-files-into-elasticsearch/ – Andrei Stefan May 27 '15 at 07:33

2 Answers2

11

I've solved this importing using Logstash.

My import script is this:

input {
      file {
          path => ["/home/user/location_cities.txt"]
          type => "city"
          start_position => "beginning"
      }
}

filter {
    csv {
        columns => ["region", "subregion", "ufi", "uni", "dsg", "cc_fips", "cc_iso", "full_name", "full_name_nd", "sort_name", "adm1", "adm1_full_name", "adm2", "adm2_full_name"]
        separator => "  "
        remove_field => [ "host", "message", "path" ]
    }
}

output {
    elasticsearch {
        action => "index"
        protocol => "http"
        host => "127.0.0.1"
        port => "9200"
        index => "location"
        workers => 4
    }
}

This script will import a tab separated file without delimiters into an index called location with type city.

To run the script, need to run bin/logstash -f import_script_file at the folder that you installed/extracted the Logstash.

Elias Soares
  • 9,884
  • 4
  • 29
  • 59
0

For efficiency, you need to use the bulk API and experiment with the block size for your data.

link to elasticsearch's documentation on bulk document indexing (importing)

If you use python, take a look at https://pypi.python.org/pypi/esimport/0.1.9

Chanoch
  • 563
  • 7
  • 16
Zouzias
  • 2,330
  • 1
  • 22
  • 32
  • 1
    Even with bulk API this was very slow. I've made this import using [Logstash](https://www.elastic.co/products/logstash) as I explained at my answer. Thanks for your help. – Elias Soares May 28 '15 at 03:23