indexing logs for fast repeated grep/search

Question

I have sets of verbose logfiles that in the course of solving a problem, I will repeatedly regrep.

I usually have about 1-10GB sets of 50-150 files, that I'll spend a few hours with and then never look at again.

Even with an SSD and lots of RAM it can take a few dozens of seconds to get results. It also only pegs 1 core, so if it could search in parallel, that would be good too.

I'm wondering if I can do any better by indexing in some way. It would be nice to spend a few minutes up front to have better performance later.

Preferably it would be something I can run at the terminal in the directory, and have an interface like grep. Then at the end I can delete the folder entirely and that will also delete the index.

Does this sound possible, and does something exist? What's my next best option?

Best I can suggest is anytime to run grep on a file [or set of], send the output to a new file and do further narrowing on that smaller 'pre-grepped' file. — Daniel Widrick, Sep 20 '13 at 16:01
Are you searching for different results each time or are you re-using older queries and grabbing new data with it? — Andrew Domaszek, Sep 20 '13 at 16:08

score 1 · Accepted Answer · answered Sep 20 '13 at 16:22

Your best bet is probably more complicated than you are willing to set up, given your requirements, such that they are.

Use a logging aggregation stack that can read/tail the files for you ( fluend, index them Elastic Search and present a pretty interface Kibana for you.

Just configure it to delete them as often as you like.

this is only one solution stack, check out logstash as well as many others

indexing logs for fast repeated grep/search

1 Answers1