0

I have a csv file like this:

2021-01-05 10:57:12.762000, REDDE EHZ AM 00, trigger
2021-01-05 10:58:26.622000, REDDE EHZ AM 00, trigger
2021-01-05 11:02:16.772000, REDDE EHZ AM 00, trigger
2021-01-05 11:02:34.042000, REDDE EHZ AM 00, trigger
2021-01-05 17:12:07.221999, REDDE EHZ AM 00, trigger
2021-01-06 01:42:45.501999, REDDE EHZ AM 00, trigger
2021-01-06 01:44:24.481999, REDDE EHZ AM 00, trigger
2021-01-06 01:44:58.051999, REDDE EHZ AM 00, trigger
2021-01-06 01:45:14.871999, REDDE EHZ AM 00, trigger
2021-01-06 01:47:10.901999, REDDE EHZ AM 00, trigger
2021-01-06 07:57:33.221999, REDDE EHZ AM 00, trigger
2021-01-06 07:57:48.821999, REDDE EHZ AM 00, trigger
2021-01-06 07:58:51.031999, REDDE EHZ AM 00, trigger
2021-01-06 07:59:27.001999, REDDE EHZ AM 00, trigger
2021-01-06 08:00:56.871999, REDDE EHZ AM 00, trigger
2021-01-06 11:28:17.191999, REDDE EHZ AM 00, trigger
2021-01-06 11:28:46.201999, REDDE EHZ AM 00, trigger
2021-01-06 11:29:19.111999, REDDE EHZ AM 00, trigger
2021-01-06 11:29:41.891999, REDDE EHZ AM 00, trigger
2021-01-06 11:30:51.901999, REDDE EHZ AM 00, trigger
2021-01-06 11:31:21.921999, REDDE EHZ AM 00, trigger
2021-01-06 11:32:23.001999, REDDE EHZ AM 00, trigger
2021-01-06 11:32:58.271999, REDDE EHZ AM 00, trigger
2021-01-07 11:33:46.891999, REDDE EHZ AM 00, trigger
2021-01-07 12:38:50.021999, REDDE EHZ AM 00, trigger
2021-01-07 12:39:53.881999, REDDE EHZ AM 00, trigger
2021-01-08 12:42:07.371999, REDDE EHZ AM 00, trigger
2021-01-08 12:42:46.441999, REDDE EHZ AM 00, trigger
2021-01-09 12:44:14.291999, REDDE EHZ AM 00, trigger

I added the header with:

df = pd.read_csv(r'D:\Inves\SM\CC_Cbba\REDPy\OSCREDDE_3_\redde_3_trigs.dat',
                 sep=',', header=None, usecols=[0, 1, 2])
headers =  ["TrigDT", "Sta", "Type"]

The output is:

                       TrigDT               Sta      Type
0  2021-01-05 10:57:12.762000   REDDE EHZ AM 00   trigger
1  2021-01-05 10:58:26.622000   REDDE EHZ AM 00   trigger
2  2021-01-05 11:02:16.772000   REDDE EHZ AM 00   trigger
3  2021-01-05 11:02:34.042000   REDDE EHZ AM 00   trigger
4  2021-01-05 17:12:07.221999   REDDE EHZ AM 00   trigger
...

I created a Date colum in roder to try to group the information by days:

df['TrigDT'] = pd.to_datetime(df['TrigDT'])
df['Date'] = df['TrigDT'].dt.date

I tried to have a cumulative sum with Index because I do not have a colum with counter of events, then I tried to group by days but I fail:

df = df.groupby('Date').index.sum()
df = df.groupby(df.index.day).cumsum().reset_index()

The idea is to create a cumulative plot with the DataFrame information (X axis with dates a Y axis with cumulative information), I tried to have a plot like https://stackoverflow.com/questions/53895480/python-plot-timedelta-and-cumulative-values

Would you mind to give me some tips to reach the objective?, the expected output could be like this, in my case only with one station that is called REDDE:

enter image description here Thank you

tonino
  • 73
  • 11
  • What' should be displayed on the Y axis? It seems a numerical column is missing... Could you add your expected output to the question? – Tranbi Jan 19 '22 at 07:47
  • Hi @Tranbi, I edited the question adding the image of the expected output, you are rigth, it is missing a numerical column, how can I generate it?, can be the index colum the one I need to plot the cumulative? – tonino Jan 19 '22 at 13:05
  • IIUC you want to display the normated count of rows? Something like `df['cumul_norm'] = df.index / len(df) * 100` could help you. What are a, b and c? Do you want one plot per `Sta` value? In that case might want to group your df first... Your question could benefit from more details anyway. – Tranbi Jan 19 '22 at 13:38
  • Hi @Tranbi, Yes the normated count of rows, a-b-c are station names, can not be grouped by only days? – tonino Jan 19 '22 at 14:42

1 Answers1

1

Not sure I understood what a, b, c would be exactly in your chart, but here is how you can accumulate the number of triggers daily:

# Load the data and set 'TrigDT' as the index
df = pd.read_csv('path_to_file.csv', header=None, usecols=(0, 1, 2), names=('TrigDT', 'Sta', 'Type'))    
df.TrigDT = pd.to_datetime(df.TrigDT)
df.set_index('TrigDT', inplace=True)

# Resample to daily and count
daily_acc = df.resample('d').count().cumsum()                                                    
daily_acc.plot()
fsl
  • 3,250
  • 1
  • 10
  • 20