reading .gz files using logstash

Question

I am trying to use logstash 5.5 for analyzing archived (.gz) files generating every minute. Each.gz file contains csv file in it. My .conf file looks like below:

input {
file {
type => "gzip"
path => [ “C:\data*.gz” ]
start_position => "beginning"
sincedb_path=> "gzip"
codec => gzip_lines
}
}

filter {
csv {
separator => ","
columns => [“COL1”,“COL2”,“COL3”,“COL4”,“COL5”,“COL6”,“COL7”]
}

}

output {
elasticsearch {
hosts => "localhost:9200"
index => "mydata"
document_type => “zipdata”
}
stdout {}
}

Initially I was getting error for missing gzip_lines plugin. So, I installed it. After installing this plugin, I can see that logstash says “Succesfully started Logstash API endpoint” but nothing get indexed. I do not see any indexing of data in elasticsearch in logstash logs. When I try to get the index in Kibana, it is not available there. It means that logstash is not putting data in elasticsearch.

May be I am using wrong configuration. Please suggest, what is the correct way of doing this?

When I run in debug , I see like below on logstash console: CSV parsing options {:col_sep=>",", :quote_char=>"\""} Pipeline main started _globbed_files: C:\data*.gz*.gz: glob is: [] Starting puma Successfully started Logstash API endpoint {:port=>9600} Pushing flush onto pipeline — sgmbd, Aug 24 '17 at 18:33
It could related to the codec plugin not working as expected. I'd suggest you tried the same configuration on the plain CSV file, without the gzip_lines codec. To narrow the problem down. — Imma, Aug 25 '17 at 07:00
Found one issue with my configuration. I was using wrong “slash” to read files in the path attribute. After correcting it , I can see logstash start reading each of the .gz file but it shows some error in reading . Pasting logs here : observe_read_file: general error reading C:/data1.csv.gz - error: java.lang.IllegalArgumentException: Object: ?Gq?Y data1.csv ?ýI²u;®&ösÀy×XM?)²?2#ô?Y¦eGó?E. ðãvÏÛ8¿ûKD?ÿüßÿ·ÿöÿÿûÿó¿ÿßÿûÿö÷¿ü×ÿçÿþÿïÿëÿöü¿Âßÿòýçÿ1~ÿ|ÿïÿüýoãÿ?Âÿãuã¿Zc¾ÿÿþþ#ºÚß_ÿóý?ÿ^ê¯ — sgmbd, Aug 25 '17 at 07:01
@Imma , yes I tried the same on simple .csv files without gzip_codec, and it works perfectly fine there. — sgmbd, Aug 25 '17 at 07:04
What do you have in the "gzip" file? That file (you indicated this name in the `sincedb_path` configuration) holds a pointer to the latest line of the file that has been sent and is used to avoid duplicates. If you have, for example, run this logstash instance more than once, it could be that files have been completely transferred and that could explain why no more data is sent. — whites11, Aug 25 '17 at 07:12
Following the different threads, I built gzip_lines plugin on my Linux (offline) server. Also, I did not reference the .gz files directly from my logstash conf file but I kept them (their absolute paths) in a .txt file line by line. BUT, logstash still not working as expected, it reads that text file line by line and not parsing the .gz files mentioned in it. — sgmbd, Aug 26 '17 at 06:10

reading .gz files using logstash

0 Answers0