0

I have a log file and need to split out the values separated by commas into separate arrays of values for graphing later. The log file is written to once a second

Data is formatted as follows:

Time, Bx, By, Bz, Status
2147483894.995726, 3424, 3424, 3424, 128
2147483895.9957414, 3552, 3552, 3552, 128
2147483896.995726, 3680, 3680, 3680, 128
2147483897.995711, 3808, 3808, 3808, 128
2147483898.9956956, 3936, 3936, 3936, 128
2147483899.9956803, 4064, 4064, 4064, 128

What's the best way of doing this? Regular expressions? I have tried using line.rsplit but can only extract the last column.

Any help appreciated! Thanks

3 Answers3

2

I highly recommend using Pandas. If you don't have it, just pip install pandas. Then, supposing your csv is named test.csv, run the following code:

import pandas as pd

df = pd.read_csv("test.csv", sep=',')

Now, df is a pandas dataframe, with every line from the original file as a row. You can iterate through them to accomplish what you want. For example:

for index, row in df.iterrows():
    print(row)

Output: 
    Time, Bx, By, Bz, Status    2147483894.995726, 3424, 3424, 3424, 128
    Name: 0, dtype: object
    Time, Bx, By, Bz, Status    2147483895.9957414, 3552, 3552, 3552, 128
    Name: 1, dtype: object
    Time, Bx, By, Bz, Status    2147483896.995726, 3680, 3680, 3680, 128
    Name: 2, dtype: object
    Time, Bx, By, Bz, Status    2147483897.995711, 3808, 3808, 3808, 128
    Name: 3, dtype: object
    Time, Bx, By, Bz, Status    2147483898.9956956, 3936, 3936, 3936, 128
    Name: 4, dtype: object
    Time, Bx, By, Bz, Status    2147483899.9956803, 4064, 4064, 4064, 128
    Name: 5, dtype: object
Pedro Martins de Souza
  • 1,406
  • 1
  • 13
  • 35
  • Thanks, will give this a try. I need to keep checking the log file for updates as it is written to by another app running. Would this be a bit too processing heavy reading in the whole file every time using pandas once the file gets rather large? – mr_Alex_Nok_ Jan 03 '19 at 12:15
  • It's no problem! To do so, use the [`chunksize`] (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) parameter, so you can process your data part by part without skipping anything. Already processed csv's greater than 10 GB with it :) – Pedro Martins de Souza Jan 03 '19 at 12:18
  • Also, as @Myggan mentioned below, you can decide to only read the last lines from it. In any case, I highly recommend you read `pandas` documentation, since you'll certainly use it. After doing so, if you still think only pandas won't be enough, use dask as a **complement** to pandas (http://docs.dask.org/en/latest/dataframe.html) – Pedro Martins de Souza Jan 03 '19 at 12:20
  • Thanks am giving it a try. I am struggling though with how to index by column? I want to pass the Bx column to a plotting function but can't see how to? – mr_Alex_Nok_ Jan 03 '19 at 13:58
1

You can use the csv module and use the function csv.reader

Bart Vanherck
  • 343
  • 2
  • 10
1

Wouldn't pandas help you here?

import pandas as pd
df=pd.read_csv("log.csv")
Myggan
  • 53
  • 5
  • Will look into them. I need to periodically check the log file as it will be updated. I was going to hold on to variable for the amount of lines in the file to know when it has been updated and only read out the latest values – mr_Alex_Nok_ Jan 03 '19 at 12:12
  • If you don't need to read the entire file if it's becoming big. Maybe this discussion can help you? By only reading the last lines x of the file. https://stackoverflow.com/questions/12523044/how-can-i-tail-a-log-file-in-python – Myggan Jan 03 '19 at 12:18