Consider a Java application that receives financial trading transactions to determine their vality by applying several checks, such as if the transaction is allowed under contractual and legal constraints. The application implements a JMS message handler to receive messages on one queue, and a second queue to send back the message to the consumer.
In order to measure response times and enable post-processing performance analysis, the application logs the start and end time of several steps, e.g. reception of message, processing, prepare and send answer back to the client. There are approx. 3 million messages received by the application per day, and hence a multiple of this number of time measurements (around 18 million logged measurements a day). Each measurement consists of the following data: ID of measurement (e.g. RECEIVE_START/END, PROCESS_START/END, SEND_START/END), time stamp as given by java.lang.system.nanoTime(), a unique message id. The time measurements are sent to a log file.
To find the processing times, the log file is transformed and stored in a MySQL database on a daily basis. This is done by a sequence of Python scripts that take the raw log data, transform and store it into a MySQL table, whereby each record corresponds to one processed message, with each measurement in one column (i.e. the table groups records by the unique message id).
My question is this: what are the best tactics and tools to analyse this relatively large data set (consider a month or several month worth of log data)? In particular I would like to calculate and graph:
a) the distribution of measurements in terms of response times (e.g. SEND_END - RECEIVE_START), for a selected time frame (e.g. monthly, daily, hourly).
b) the frequencies of messages per time unit (second, hour, day, week, month), over a selected time period (e.g. day, week, month, year)
Any hints or reports on your own experience is appreciated.