Need help making a Logstash Pipeline to ingest a CSV file from a GitHub repository and have it add a timestamp of when it was pulled

Question

The problem is that I need to get Logstash to ingest a csv file that I have stored in a public GitHub repository (viewed as a raw file). The csv does not have a timestamp and I want Logstash to add one otherwise it does not connect to Grafana.

The following is my pipline code in Logstash

input {
  http_poller {
    urls => {
      csv_data => {
        method => get
        test1 => "https://raw.githubusercontent.com/shujah-TRN-Infosys/dag-test/main/dag_meta_data.csv"
            headers => {
                Accept => "text/csv"
            }
        }
    }
    request_timeout => 60
    schedule => { cron => "* * * * * UTC" } #EveryMinute
    codec => "plain"
  }
}

filter {
  csv {
    separator => ","
    columns => ["dag_id","batch", "sor", "consumer", "application", "depends_on_ingestion", "depends_on_curation", "dag_type"] # Specify column names here
  }
  ruby {
    code => "event.set('@timestamp', Logstash::Timestamp.now)"
    }
}

output {
  elasticsearch {
    hosts => [ "xxxxxxx" ]
    user => "xxxxx" 
    password => "xxxx" 
    index => "meta_csv_data-%{+YYYY.MM.dd}"
  }
  #stdout { codec => rubydebug }
}

I have tried the pipeline code (sensitive info redacted) and waited a minute to see if any indices popped up in the index management section of elastic cloud, there were none, so I am assuming it is not working. Any ideas on how to approach this or if there is something wrong with my pipeline code.

Hi there, why don't you first check whether your data injection is working ? Just inject the *.csv data to elastic first, if there is no issue then you can look for another issue related to timestamp~ — Farkhod Abdukodirov, Jun 23 '23 at 01:06

score 0 · Answer 1 · answered Jun 21 '23 at 15:36

Tldr;

I believe this is a simple mistake when reading the documentation. You have set a key named test where it should have been named url.

Indeed as per the documentation:

input {
  http_poller {
    urls => {
      test1 => "http://localhost:9200"
      test2 => {
        # Supports all options supported by ruby's Manticore HTTP client
        method => get
        user => "AzureDiamond"
        password => "hunter2"
        url => "http://localhost:9200/_cluster/health" <= this can not be custom
        headers => {
          Accept => "application/json"
        }
     }
    }
    request_timeout => 60
    # Supports "cron", "every", "at" and "in" schedules by rufus scheduler
    schedule => { cron => "* * * * * UTC"}
    codec => "json"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

Also, since you are processing lines, you will want to have a message per \n. You may want to use the codec line.

Solution:

Your input should look like so

input {
  http_poller {
    urls => {
      csv_data => {
        method => get
        url => "https://raw.githubusercontent.com/shujah-TRN-Infosys/dag-test/main/dag_meta_data.csv"
      }
    }
    request_timeout => 60
    schedule => { cron => "* * * * * UTC" } #EveryMinute
    codec => "line"
  }
}

To reproduce

As your url is not public, I created a test of my own

input {
  http_poller {
    urls => {
      csv_data => {
        method => get
        url => "https://raw.githubusercontent.com/elastic/ecs/main/generated/csv/fields.csv"
      }
    }
    request_timeout => 60
    schedule => { cron => "* * * * * UTC" } #EveryMinute
    codec => "line"
  }
}

filter {
  csv {
    separator => ","
    columns => ["ECS_Version","Indexed","Field_Set","Field","Type","Level","Normalization","Example","Description"] # Specify column names here
  }
  ruby {
    code => "event.set('@timestamp', Logstash::Timestamp.now)"
    }
}

output {
  stdout { codec => rubydebug }
}

Need help making a Logstash Pipeline to ingest a CSV file from a GitHub repository and have it add a timestamp of when it was pulled

1 Answers1

Tldr;

Solution:

To reproduce