0

I have a DataFrame df with two columns x and y which I would like to plot as a line plot as follows:

import matplotlib.pyplot as plt
import seaborn as sns 
fig = plt.figure(figsize=(9, 7))
ax = plt.subplot(111)   
df = df.groupby(x, as_index=False).mean()
df = df.sort_values(x)
df[y] = df[y].rolling(1000).mean()
df = df.dropna()
sns.lineplot(data=df, x=x, y=y)
plt.tight_layout()

The resulting plot looks as follows:

enter image description here

As can be seen, there are much more data points with lower x-value, i.e. with increasing x-value there are less and less data points. Thus, using the rolling average with a fixed windows size of 1000 is averaging too many data points for big x-values and too little data points for low x-values.

Is there a possibility to make the window for the rolling average decreasing with larger x-value or adaptive to the number of data points? Or does for this kind of data a better approach than rolling average exist?

machinery
  • 5,972
  • 12
  • 67
  • 118
  • have you considered plotting your data on a semilog plot? – yann ziselman May 18 '21 at 12:00
  • @yanziselman Using semilogx plot, when I plot a regression line using `b, m = polyfit(df[x], df[y], 1)` and `plt.plot(df[x], b + m * df[x], '--', color=[0.5, 0.5, 0.5])` this line is no more straight but bent (due to the logscale). Is it possible to get a straight regression line in a log plot? – machinery May 18 '21 at 12:55
  • your data looks like decaying oscillations. so no, a straight line would not be a good model to fit to your data. – yann ziselman May 18 '21 at 13:07
  • @yanziselman So on a semilogx plot should I just plot the linear regression line on the log scale (resulting in a bent curve) or what regression line would you use instead? – machinery May 18 '21 at 13:57

0 Answers0