0

how to ignore old files and push only latest log files from S3 using logstash. We are using logstash to push cloudtaril logs from s3 to elasticsearch. Cloudtrail logs are in below format

/AWSLogs/CloudTrail/xxxAccount Numberxxxx/aws-region/year(YYYY)/Month(MM)/day(DD)/

I need to pull only latest data(like data form current month), as the entire bucket has huge terrabytes of data and logstash is not able to scale that much data. Is there a way to do this?

user2416
  • 223
  • 4
  • 18
  • Perhaps with the [ignore older](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#plugins-inputs-file-ignore_older) option in the logstash file input? This setting is in second, so for a month, you'd have to set it to 30 * 24 * 60 * 60 = ‭2 592 000‬) – baudsp Oct 07 '19 at 12:39
  • @baudsp ignore older option is not supported with S3 plugin. Is there any other way? – user2416 Oct 07 '19 at 17:07
  • I hadn't realized you were using S3, sorry. Perhaps you could parse the date on the log and drop if it's older than current date + one month (see here [how](https://stackoverflow.com/a/30092806/6113627) to do it). – baudsp Oct 08 '19 at 12:11

2 Answers2

3

I just had the same problem and solved it (read: worked around it) like this:

Starting logstash with a normal config which leads to the behaviour you described.

It'll tell you on startup in its logs where its sincedb file is located. (defaults to logstash-7.8.0/data/plugins/inputs/s3/sincedb_someid).

The file takes a while to be created. When the file is created stop logstash again.

Now, I guess, you could delete the data that was just imported but I didn't care to.

Now edit the file. It's just a UTC timestamp. Adjust it close to now.

Start logstash again and it will start processing files created after the timestamp you just put in.

fko
  • 91
  • 5
0

You can move the logs to a different folder after they've been processed. This will keep you from processing them a second time, and will also make the processing much faster (we found s3/logstash to be extraordinarily slow with larger folders).

See the backup_to_bucket option.

Alain Collins
  • 16,268
  • 2
  • 32
  • 55
  • logstash performance is very slow when using backup_to_bucket option. – user2416 May 06 '20 at 21:00
  • It pretty much won't work at all if you let the files accumulate in the same folder, so backup or delete are about the only options. We got good performance with backups. – Alain Collins May 07 '20 at 22:18