Database to manage million log files

Question

I wish to have large number (e.g. million) of log files on a system. But OS has limit on opened files. It is not efficient to create million files in single folder.

Is there ready solution, framework or database that will create log files and append data to log files in efficient manner?

I can imagine various techniques to optimize management of large number of log files but there might something that does that out of box.

e.g. I wish that log file was re-created every day or when it reach 50MB. Old log files must be stored. e.g uploaded to Amazon S3.

I can imagine that log database writes all logs in single file but later processes it appends records in millions of files.

May be there is special file system that is good for such task. I can't find anything. I am sure there might be solution.

PS I wish to run logging on single server. I say 1 million because it is more then default limit on opened files. 1 million files 1MB is 1TB and it could be stored on regular harddrive.

I look for existing solution before I will write my own. I am sure there might a set of logging servers. I just do not know how to search for them.

Why not just use some standard logging framework like log4J and create a simple scheduled task that will be moving log files once per day or when file will be larger than ... KB? — m.antkowicz, Jun 23 '17 at 10:31
e.g. I have 1 million of users and 10.000 will be logged today. I wish to have 1 log file per user. I do not wish to re-invent the wheel if there is some solution to manager many log files. There might cache of opened files, routines to flush opened files every X seconds. — Tema, Jun 23 '17 at 11:01
to be honest I believe that having a log file for each user seems to be a result of really bad design - you should really think about another approach imo — m.antkowicz, Jun 23 '17 at 13:47

score 1 · Accepted Answer · answered Jun 23 '17 at 10:32

1

I would start thinking of a Cassandra of Hadoop as a store for log data and eventually if you want these data in a form of a files write a procedure that will make a select on one of these databases and place them in formatted files.

answered Jun 23 '17 at 10:32

Tomasz Kosiński

56
3

The problem with regular database is that they are designed to index data. It is not efficient to write logs in regular database. – Tema Jun 23 '17 at 11:03
Cassandra is designed for clusters. I am looking for solution to run on a single system with single drive. – Tema Jun 23 '17 at 11:12
From what I see if you set up a single node HDFS server then you have configuration parameter dfs.namenode.fs-limits.max-directory-items to 0 and have no limit for number of files in single directory. (Limit by memory avaialble only). An then just use this in you app. – Tomasz Kosiński Jun 23 '17 at 11:46
The question is if you really need to have open at the same time 10k of files or more ? I see on linux the default limit per process is 64k using the ulimit -n command to get this information. – Tomasz Kosiński Jun 23 '17 at 11:58
Some interesting information on how to change per process open files limit for linux server is here : https://stackoverflow.com/questions/3734932/max-open-files-for-working-process – Tomasz Kosiński Jun 23 '17 at 11:59

Database to manage million log files

1 Answers1