0

I have sets of verbose logfiles that in the course of solving a problem, I will repeatedly regrep.

I usually have about 1-10GB sets of 50-150 files, that I'll spend a few hours with and then never look at again.

Even with an SSD and lots of RAM it can take a few dozens of seconds to get results. It also only pegs 1 core, so if it could search in parallel, that would be good too.

I'm wondering if I can do any better by indexing in some way. It would be nice to spend a few minutes up front to have better performance later.

Preferably it would be something I can run at the terminal in the directory, and have an interface like grep. Then at the end I can delete the folder entirely and that will also delete the index.

Does this sound possible, and does something exist? What's my next best option?

Michael Hampton
  • 244,070
  • 43
  • 506
  • 972
mhb
  • 103
  • 3

1 Answers1

1

Your best bet is probably more complicated than you are willing to set up, given your requirements, such that they are.

Use a logging aggregation stack that can read/tail the files for you ( fluend, index them Elastic Search and present a pretty interface Kibana for you.

Just configure it to delete them as often as you like.

this is only one solution stack, check out logstash as well as many others