Our servers store log files under directory according to date, with time as file name. e.g:
/2015.08.21/01.23
/2015.08.21/01.24
/2015.08.21/01.25
where the filenames follow [hours].[minutes] convention.
How to configure Logstash to read from the latest file (what is the general practice)?
- 1st attempt:
I set the path to:
path => ["/2015.08.21/*"]
Logstash opened up a lot of files, until the logsource (linux) doesn't allow it to open new files.
- 2nd attempt:
Use rsync to copy all the files, and merge them into single file.
However, I have problem to deal with partial log file, for example, if the current log file is 11.12, and it is still being written, I will only have partial data.
- 3rd attempt:
Periodically create a symbolic link which point to latest file. I haven't try this yet, but I think it should work, I need to figure out how to create simple scheduler in linux.
- Update
I have tried the 3rd attempt, but I see some drawbacks from this method.
- If Logstash spend more than 1 minute to process the file, it will not be able to process the whole file, because the symbolic link points to next file.
- The last 1 or 2 events may not able to get processed.