4

The Plotly express lowess trendline is providing a very poor fit. In some situations, it works well but with this particular data, it is does not.

When I make the lowess trendline using statsmodels.nonparametric.smoothers_lowess.lowess() I get a much better adjustment because it allows to tweak the frac parameter.

So, is there a way to control the parameters of Plotly lowess function to solve that?

df dataframe

date    y
1   2016-05-01  15
2   2016-05-08  18
3   2016-05-15  19
4   2016-05-22  14
5   2016-05-29  20
6   2016-06-05  18
7   2016-06-12  17
8   2016-06-19  18
9   2016-06-26  21
10  2016-07-03  23
11  2016-07-10  13
12  2016-07-17  16
13  2016-07-24  19
14  2016-07-31  31
18  2016-08-28  21
19  2016-09-04  22
20  2016-09-11  16
21  2016-09-18  17
22  2016-09-25  15
23  2016-10-02  12
24  2016-10-09  12
25  2016-10-16  12
26  2016-10-23  16
27  2016-10-30  13
28  2016-11-06  14
29  2016-11-13  14
30  2016-11-20  15
31  2016-11-27  12
32  2016-12-04  15
33  2016-12-11  14
34  2016-12-18  9
35  2016-12-25  9
36  2017-01-01  9
37  2017-01-08  7
38  2017-01-15  12
39  2017-01-22  10
40  2017-01-29  14
41  2017-02-05  10
42  2017-02-12  13
43  2017-02-19  12
44  2017-02-26  11
45  2017-03-05  17
46  2017-03-12  13
47  2017-03-19  13
48  2017-03-26  13
49  2017-04-02  15
50  2017-04-09  12
51  2017-04-16  11
52  2017-04-23  11

<class 'pandas.core.frame.DataFrame'>
Int64Index: 49 entries, 1 to 52
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    49 non-null     datetime64[ns]
 1   y       49 non-null     int32         
dtypes: datetime64[ns](1), int32(1)
memory usage: 2.0 KB
import plotly.express as px
px.scatter(data_frame = df, 
           x = "date", 
           y = "y",
           trendline="lowess") 

That produces this plot: enter image description here

Plotly version: 4.14.3

Edit: adding the corresponding Github issue -> https://github.com/plotly/plotly.py/issues/3202

Thank you

GitHunter0
  • 424
  • 6
  • 10
  • Have you tried to get the best curve fit with polyfit/polyval in numpy? Did you use any (non) parametric test, like t-test, to measure the quality of the solution provided? Furthermore, another trendline available is "ols", Ordinary Least Square: maybe it's a better match – crissal Apr 21 '21 at 04:13
  • @crissal No, I just tested with statsmodels lowess function and got a much better fit. I believe the difference is that with statsmodels you can set the fitness parameters, while with Plotly you cannot as far as I know. I did not use "OLS" because it draws a line and I want a very well adjusted curve. – GitHunter0 Apr 21 '21 at 15:10

1 Answers1

2

Just add trendline_options=dict(frac=0.2) to px.scatter()

import plotly.express as px
px.scatter(data_frame = df, 
           x = "date", 
           y = "y",
           trendline="lowess",
           trendline_options=dict(frac=0.2))