How to analyse log with > 30m measurements

Question

Consider a Java application that receives financial trading transactions to determine their vality by applying several checks, such as if the transaction is allowed under contractual and legal constraints. The application implements a JMS message handler to receive messages on one queue, and a second queue to send back the message to the consumer.

In order to measure response times and enable post-processing performance analysis, the application logs the start and end time of several steps, e.g. reception of message, processing, prepare and send answer back to the client. There are approx. 3 million messages received by the application per day, and hence a multiple of this number of time measurements (around 18 million logged measurements a day). Each measurement consists of the following data: ID of measurement (e.g. RECEIVE_START/END, PROCESS_START/END, SEND_START/END), time stamp as given by java.lang.system.nanoTime(), a unique message id. The time measurements are sent to a log file.

To find the processing times, the log file is transformed and stored in a MySQL database on a daily basis. This is done by a sequence of Python scripts that take the raw log data, transform and store it into a MySQL table, whereby each record corresponds to one processed message, with each measurement in one column (i.e. the table groups records by the unique message id).

My question is this: what are the best tactics and tools to analyse this relatively large data set (consider a month or several month worth of log data)? In particular I would like to calculate and graph:

a) the distribution of measurements in terms of response times (e.g. SEND_END - RECEIVE_START), for a selected time frame (e.g. monthly, daily, hourly).
b) the frequencies of messages per time unit (second, hour, day, week, month), over a selected time period (e.g. day, week, month, year)

Any hints or reports on your own experience is appreciated.

score 0 · Answer 1 · answered Feb 21 '13 at 17:46

We've had a lot of success with splunk for processing/reporting on large log files. It's a tool that's built specifically for that purpose. You can run SQL-like queries on your data files to get the kind of reports/graphs you are looking for. I believe it can be pretty expensive though, IIRC they charge you based on the amount of data that you process.

http://www.splunk.com/?r=header

How to analyse log with > 30m measurements

1 Answers1