1

I am trying to index sample csv based data into opendistro elasticsearch but failing to create the index. Could you please let me what i am missing here.

csv file to index

[admin@fedser32 logstashoss-docker]$ cat /tmp/student.csv 
"aaa","bbb",27,"Day Street"
"xxx","yyy",33,"Web Street"
"sss","mmm",29,"Adam Street"

logstash.conf

[admin@fedser32 logstashoss-docker]$ cat logstash.conf
input {
  file {
    path => "/tmp/student.csv"
    start_position => "beginning"
}
}
filter {
  csv {
    columns => ["firstname", "lastname", "age", "address"]
  }
}
output {
    elasticsearch {
        hosts => ["https://fedser32.stack.com:9200"]
        index => "sampledata"
        ssl => true
        ssl_certificate_verification => false
        user => "admin"
        password => "admin@1234"
    }
}

My Opendistro cluster is listening on 9200 as shown below.

[admin@fedser32 logstashoss-docker]$ curl -X GET -u admin:admin@1234 -k https://fedser32.stack.com:9200
{
  "name" : "odfe-node1",
  "cluster_name" : "odfe-cluster",
  "cluster_uuid" : "5GOEtg12S6qM5eaBkmzUXg",
  "version" : {
    "number" : "7.10.0",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "51e9d6f22758d0374a0f3f5c6e8f3a7997850f96",
    "build_date" : "2020-11-09T21:30:33.964949Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

As per the logs it does indicate it is able to find the csv file as shown below.

logstash_1  | [2022-03-03T12:11:44,716][INFO ][logstash.outputs.elasticsearch][main] Index Lifecycle Management is set to 'auto', but will be disabled - Index Lifecycle management is not installed on your Elasticsearch cluster
logstash_1  | [2022-03-03T12:11:44,716][INFO ][logstash.outputs.elasticsearch][main] Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}
logstash_1  | [2022-03-03T12:11:44,725][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>125, "pipeline.sources"=>["/usr/share/logstash/pipeline/logstash.conf"], :thread=>"#<Thread:0x5c537d14 run>"}
logstash_1  | [2022-03-03T12:11:45,439][INFO ][logstash.javapipeline    ][main] Pipeline Java execution initialization time {"seconds"=>0.71}
logstash_1  | [2022-03-03T12:11:45,676][INFO ][logstash.inputs.file     ][main] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/usr/share/logstash/data/plugins/inputs/file/.sincedb_20d37e3ca625c7debb90eb1c70f994d6", :path=>["/tmp/student.csv"]}
logstash_1  | [2022-03-03T12:11:45,697][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
logstash_1  | [2022-03-03T12:11:45,738][INFO ][filewatch.observingtail  ][main][2f140d63e9cab8ddc711daddee17a77865645a8de3d2be55737aa0da8790511c] START, creating Discoverer, Watch with file and sincedb collections
logstash_1  | [2022-03-03T12:11:45,761][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
logstash_1  | [2022-03-03T12:11:45,921][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

sb9
  • 370
  • 3
  • 17
  • One possibility is that logstash has already read the content of the file so won't read the content again on further executions (cf [the doc](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#_tracking_of_current_position_in_watched_files)). One way to check if that's the case would be to use a new file or adding new lines to the existing file; also when debugging it can be useful to have an stdout output plugin, so that you can see what logstash is doing. – baudsp Mar 03 '22 at 12:58
  • hi @baudsp, i am running logstash using docker compose. So i tried changing the csv file name and updated the logstash.conf with plugin output { stdout { codec => rubydebug } } But nothing indicating in the docker container logs and no index getting created – sb9 Mar 03 '22 at 13:19
  • hi @baudsp, tried with elasticsearch as a service . It was able to index the data and written in the console logs [admin@fedser32 logstashoss-docker]$ sudo systemctl status elasticsearch.service [admin@fedser32 logstashoss-docker]$ sudo /usr/share/logstash/bin/logstash -f ./logstash.conf versions logstash-7.17.1-1.x86_64 elasticsearch-7.17.1-1.x86_64 Nothing changed except the elasticsearch endpoint ssl setting disabled in logstash.conf. Not sure whats wrong with 'logstash-oss:7.9.1' and 'opendistro-for-elasticsearch:1.12.0' image setup which is not working. – sb9 Mar 03 '22 at 13:46

2 Answers2

0

Could you check the access right for /tmp/student.csv file? it must be readable by user logstash. check with this command:

#ls -l /tmp

Other way, if you have already indexed the file path, you have to clean up the sincedb

Idriss
  • 56
  • 1
  • 6
0

The thing that i was missing is i had to volume mount my CSV file into the logstash container as shown below after which i was able to index my csv data.

[admin@fedser opensearch-logstash-docker]$ cat docker-compose.yml 
version: '2.1'
services:
  logstash:
    image: opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.2
    ports:
      - "5044:5044"
    volumes:
      - $PWD/logstash.conf:/usr/share/logstash/pipeline/logstash.conf
      - $PWD/student.csv:/tmp/student.csv
sb9
  • 370
  • 3
  • 17