0

Our servers store log files under directory according to date, with time as file name. e.g:

/2015.08.21/01.23
/2015.08.21/01.24
/2015.08.21/01.25

where the filenames follow [hours].[minutes] convention.

How to configure Logstash to read from the latest file (what is the general practice)?

  • 1st attempt:

I set the path to:

path => ["/2015.08.21/*"]

Logstash opened up a lot of files, until the logsource (linux) doesn't allow it to open new files.

  • 2nd attempt:

Use rsync to copy all the files, and merge them into single file.

However, I have problem to deal with partial log file, for example, if the current log file is 11.12, and it is still being written, I will only have partial data.

  • 3rd attempt:

Periodically create a symbolic link which point to latest file. I haven't try this yet, but I think it should work, I need to figure out how to create simple scheduler in linux.

  • Update

I have tried the 3rd attempt, but I see some drawbacks from this method.

  1. If Logstash spend more than 1 minute to process the file, it will not be able to process the whole file, because the symbolic link points to next file.
  2. The last 1 or 2 events may not able to get processed.
janetsmith
  • 8,562
  • 11
  • 58
  • 76

1 Answers1

1

The first time you try to run this config, it seems reasonable that logstash would want to open a lot of files. In that case, consider raising the number of open files available to the process.

Once it has processed the file, it will detect that it is not being written to and not keep the file open. It will check the file periodically to make sure nothing new has been written. So, once you're caught up, it should be friendlier.

To help catch up on the initial run, try setting the pattern to something smaller, like:

path => ["/2015.08.21/01.*"]

which should only match 60 files.

You might also reconsider your design of having one file per minute; without more information, it seems excessive.

Alain Collins
  • 16,268
  • 2
  • 32
  • 55
  • thanks for the suggestion. However if I start logstash at 8am, then I have to try it with 01.*, 02.*, 03.*? That will be a bit of work. I have no control over the log policy :( I am not dev-op. – janetsmith Aug 26 '15 at 03:40
  • Logstash is intended to be left running. You can also make patterns like "0[1-5]*". – Alain Collins Aug 26 '15 at 04:23
  • Logstash gave me "file open permission error" when it reads 100th files, I guess my account only allowed to read 100 files concurrently. And you are right, Logstash should be left running. I am doing proof-of-concept work, which Logstash is running on my own laptop, so it is not running all the time. – janetsmith Aug 26 '15 at 06:41
  • Try to adjust your ulimit, assuming you're on linux. – Alain Collins Aug 26 '15 at 13:47
  • I've checked the ulimit on the log server, it's 65535. I actually mounted the directory to my machine using sshfs. So I am guessing either sshfs or my ssh account have this file number limit. – janetsmith Aug 27 '15 at 05:24