-1

Hi I have a ECDF plot by seaborn which is the following.

I can obtain this by doing sns.ecdfplot(data=df2, x='time', hue='seg_oper', stat='count').

enter image description here

My dataframe is very simple:

In [174]: df2
Out[174]: 
           time           seg_oper
265       18475     1->0:ADD['TX']
2342      78007     0->1:ADD['RX']
2399      78613  1->0:DELETE['TX']
2961      87097     0->1:ADD['RX']
2994      87210     0->1:ADD['RX']
...         ...                ...
330823  1002281  1->0:DELETE['TX']
331256  1003545  1->0:DELETE['TX']
331629  1004961  1->0:DELETE['TX']
332375  1006663  1->0:DELETE['TX']
333083  1008644  1->0:DELETE['TX']

[834 rows x 2 columns]

How can I substract series 0->1:ADD['RX'] from 1->0:DELETE['TX']?

I like seaborn because most of this data mangling is done inside the library, but in this case I need to substract these two series ...

Thanks.

Lucas Aimaretto
  • 1,399
  • 1
  • 22
  • 34
  • 2
    I must be missing something, but can you elaborate on what you mean by subtracting series ```0->1:ADD['RX']``` from ```1->0DELETE['TX']```. For example given your sample input, what do you expect the output to look like? – itprorh66 Sep 09 '21 at 21:41
  • 1
    You have to manually calculate the `ecdf` for each `seg_oper`. However, it doesn't make sense the calculate the difference between each ecdf. Also, see [here](https://trenton3983.github.io/files/projects/2019-07-10_statistical_thinking_1/2019-07-10_statistical_thinking_1.html#plot-multiple-ECDFs) not all points from multiple ecdf align. Review [What, Why, and How to Read Empirical CDF](https://towardsdatascience.com/what-why-and-how-to-read-empirical-cdf-123e2b922480) and [Compare distributions of two ECDFs](https://stats.stackexchange.com/questions/115132/compare-distributions-of-two-ecdfs) – Trenton McKinney Sep 10 '21 at 02:07
  • @itprorh66, from the given DF I want to substract one series `df2[0->1:ADD['RX']]` from the other `df2[1->0:DELETE['TX']]`. The plot is what I obtain from the seaborn library automatically. In my DF I have alltogether; you distinguish one series from the other by the filed `seg_oper`. Seaborn uses the `hue` parameter to do so. I have however solved it; I'll post an answer. – Lucas Aimaretto Sep 10 '21 at 11:19
  • @TrentonMcKinney, yes, simply put I want to obtain the series that seaborn finds on its own (and then operate them the way I want). I have seen that the samples are misaligned, but I have solved that. I'll post an answer. Thanks. – Lucas Aimaretto Sep 10 '21 at 11:21

1 Answers1

0

So the first thing is to obtain what seaborn does, but manually. After that (because I need to) I can subtract one series from the other.

Cumulative Count

First we need to obtain a cumulative count per each series.

In [304]: df2['cum'] = df2.groupby(['seg_oper']).cumcount()                                                                                                                                                  

In [305]: df2
Out[305]: 
           time           seg_oper  cum
265       18475     1->0:ADD['TX']    0
2961      87097     0->1:ADD['RX']    1
2994      87210     0->1:ADD['RX']    2
...         ...                ...  ...
332375  1006663  1->0:DELETE['TX']  413
333083  1008644  1->0:DELETE['TX']  414

Pivot the data

Rearrange the DF.

In [307]: df3 = df2.pivot(index='time', columns='seg_oper',values='cum').reset_index()

In [308]: df3
Out[308]: 
seg_oper     time  0->1:ADD['RX']  1->0:ADD['TX']  1->0:DELETE['TX']
0           18475             NaN             0.0                NaN
1           78007             0.0             NaN                NaN
2           78613             NaN             NaN                0.0
3           87097             1.0             NaN                NaN
4           87210             2.0             NaN                NaN
..            ...             ...             ...                ...
828       1002281             NaN             NaN              410.0
829       1003545             NaN             NaN              411.0
830       1004961             NaN             NaN              412.0
831       1006663             NaN             NaN              413.0
832       1008644             NaN             NaN              414.0

[833 rows x 4 columns]

Fill the gaps

I'm assuming that the NaN values can be filled with the previous value of the row until the next one.

df3=df3.fillna(method='ffill')

At this point, if you plot df3 you'll obtain the same as doing sns.ecdfplot(df2) with seaborn.

I still want to substract one series from the other.

df3['diff'] = df3["0->1:ADD['RX']"] - df3["1->0:DELETE['TX']"]
df3.plot(x='time') 

The following plot, is the result.

enter image description here

pd: I don't understand the negative vote on the question. If someone can explain, I'll appreciate it.

Lucas Aimaretto
  • 1,399
  • 1
  • 22
  • 34