0

I'm analyzing database performance by checking the count of active database processes at specific timestamps. An example is as below:

CloseStatement,ClusterIndexScanVecOutJob<ScanRangePredicate>,ExecQidItab,ExecutePrepared,ExecuteStatement,NoAction,PrepareStatement,core/stat,timestamp

1,1,2,15,1,1,5,1,2020-03-30T18:15:24.378238

CloseCursor,ClusterIndexScanVecOutJob<ScanRangePredicate>,CommitTrans,ExecQidItab,ExecutePrepared,ExecuteStatement,JobParallelMgetSearch,NoAction,ParallelFor Job,PrepareStatement,SearchPartJob,core/stat,flushing,timestamp

1,1,1,6,16,1,2,1,9,2,5,1,1,2020-03-30T18:16:24.435657

The first line is the name of the database process and the line that follows contain the number of each process. For example, There was 1 database process called 'CloseStatement' and 15 'ExecutePrepared' processes at time timestamp 2020-03-30T18:15:24.378238

I'm trying to build statistics based on the count of the process at specific times. From a pandas perspective, the headers (CloseStatement, CloseCursor) differs at each instance and they're not uniform either. How can I import this into a dataframe? Thanks for your time!

  • This one here https://stackoverflow.com/questions/26063231/read-specific-columns-with-pandas-or-other-python-module might be of some help if the header names remain same – vvk24 May 14 '20 at 04:34
  • My problem is that the dataset contains 2 lines for every timestamp. The first line is the header and the second line is the data. So, I would run into the parse errors because the number of fields would differ. – Jerry Kross May 15 '20 at 04:49
  • I agree. Try giving the field name directly as mentioned in the link above. That works if you have the same field names in all the datasets irrespective of the column index. – vvk24 May 16 '20 at 06:36

0 Answers0