Logstash configuration for CSV with two table structures

Question

I am trying to load a set of CSV files using logstash. The CSV files contains two tables, where only the second table is of my interest. Any suggestions on how to skip the entries in the first table? (Say the first 50 lines of the CSV file)

My current conf files looks as follows:

input{
    file{
        path => "/home/username/pathtoData/*"
        start_position => beginning
    }
}
filter{
    csv{
        columns => ["col_name_a", "col_name_b", ...]
        separator => ","
    }
}
output{
    elasticsearch{
        hosts => ["localhost:portnum"]
    }
}

score 0 · Answer 1 · edited May 23 '17 at 11:59

You didn't specify the structure of the two datasets, but let's assume you have some way to tell them apart. For example, you could use a regular expression that counts the number of commas.

Suppose any column with 5 commas is one that you don't want. You could conditionally send those rows to the drop filter:

filter {
    if [message] =~ /^([^,]*,){5}[^,]*$/ {
        drop {}
    }
}

Or, you could drop any column that does not have exactly 7 commas:

filter {
    if [message] !~ /^([^,]*,){7}[^,]*$/ {
        drop {}
    }
}

If you need more elaborate filtering, you could use the grok filter to examine each row more closely. So long as you have some condition you can filter on, you can use conditionals to drop the rows that you don't want.

Logstash configuration for CSV with two table structures

1 Answers1