0

I have CSV file with 1000 rows and 3 columns, as follows:

field1, field2, field3
ABC     A65     ZZZ
...

I want to export its content into the mapping myrecords of the index myindex (I have more mappings in this index):

PUT /myindex
{
  "mappings": {
    "myrecords": { 
      "_all": {
        "enabled": false
      },
      "properties": {
          "field1":  { "type": "keyword" },
          "field2":  { "type": "keyword" },
          "field3":  { "type": "keyword" }
       }
    }
  }
}

Is there any easy way to do it?

UPDATE:

I executed this Logstash config file, but though the size of CSV is small (1000 entries), the process is running eternally. When I execute GET /myindex/myrecords/_search, I see only 1 record all the time.

input {
    file {
        path => ["/usr/develop/data.csv"]
        start_position => beginning
    }
}

filter {  
    csv {
        columns => ["field1","field2","field3"]
        separator => ","
    }
}

output {
    stdout { codec => rubydebug }
    elasticsearch {
        action => "index"
        hosts => ["127.0.0.1:9200"]
        index => "myindex"
        document_type => "myrecords"
        document_id => "%{Id}"  // Here I also tried "%{field1}"
        workers => 1
    }
}
Dinosaurius
  • 8,306
  • 19
  • 64
  • 113
  • This answer might help: https://stackoverflow.com/questions/35433121/how-to-import-csv-data-using-logstash-for-field-type-completion-in-elasticsearch/35433231#35433231 – Val Sep 04 '17 at 09:09
  • @Val: Thanks. I tried this solution, but the data is not exported (only 1 record). I assume that the problem occurs due to `document_id => "%{Id}"`. I do not have the field `Id` in my mapping. How can I solve this issue? Please my updated question. – Dinosaurius Sep 04 '17 at 11:36
  • Simply remove the `document_id` setting. – Val Sep 04 '17 at 11:39
  • @Val: I tried it. In the terminal I see this info all the time, but no data is returned by `GET` from Elastic: `[2017-09-04T13:40:49,886][INFO ][logstash.pipeline ] Starting pipeline {"id"=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>500} [2017-09-04T13:40:50,091][INFO ][logstash.pipeline ] Pipeline main started [2017-09-04T13:40:50,140][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600} ` – Dinosaurius Sep 04 '17 at 11:41
  • I also see this in logs of Elastic: `[2017-09-04T13:42:08,638][INFO ][o.e.c.r.a.DiskThresholdMonitor] [LOQmMI1] low disk watermark [85%] exceeded on [LOQmMI11SJyen1gQOlXnug][LOQmMI1][/usr/local/elasticstack_5.5.0/elasticsearch-5.5.0/data/nodes/0] free: 15.3gb[11%], replicas will not be assigned to this node` – Dinosaurius Sep 04 '17 at 11:42
  • 1
    Try to add `sincedb_path => "/dev/null"` in your `file` input. – Val Sep 04 '17 at 11:47
  • @Val: Now it works. Thanks. Could you please post the answer and explain the meaning of `sincedb_path`? – Dinosaurius Sep 04 '17 at 11:53
  • 1
    You should consider upvoting the answer I linked to instead. `sincedb_path` is simply a file that allows Logstash to remember the lines it has already parsed. Setting it to null will make sure that logstash starts at the beginning of the file. – Val Sep 04 '17 at 11:54

0 Answers0