You've already put together a procedure that solves your problem, but I would like to mention that you can use plotly.express
and do the very same thing with only a very few lines of code. Using px.scatter()
there are actually two slightly different ways, depending on whether your data is of a long or wide format. Your data seems to be of the latter format, since you're asking:
how can I make this work with separate traces?
So I'll start with that. And I'll use a subset of the built-in dataset px.data.stocks()
since you haven't provided a data sample.
Code 1 - Wide data
fig_wide = px.scatter(df_wide, x = 'index', y = ['AAPL', 'MSFT'],
trendline = 'ols',
)
Code 2 - Long data
fig_long = px.scatter(df_long, x= 'index', y = 'value',
color = 'variable',
trendline = 'ols')
Plot 1 - Identical results

About the data:
A dataframe of a wide format typically has an index with unique values in the left-most column, variable names in the column headers, and corresponding values for each variable per index in the columns like this:
index AAPL MSFT
0 1.000000 1.000000
1 1.011943 1.015988
2 1.019771 1.020524
3 0.980057 1.066561
4 0.917143 1.040708
Here, adding information about another variable would require adding another column.
A dataframe of a long format, on the other hand, typically organizes the same data with only (though not necessarily only) three columns; index, variable and value
:
index variable value
0 AAPL 1.000000
1 AAPL 1.011943
.
.
100 MSFT 1.720717
101 MSFT 1.752239
An contrary to the wide format, this means that index
will have duplicate values. But for a good reason.
So what's the difference?
If you look at Code 1
you'll see that the only thing you need to specify for px.scatter
in order to get multiple traces with trendlines, in this case AAPL and MSFT on the y-axis versus an index on the x-axis
, is trendline = 'ols'
. This is because plotly.express
automatically identifies the data format as wide and knows how to apply the trendlines correctly. Different columns means different catrgories for which a trace and trendline are produced.
As for the "long approach", you've got both GOOG and AAPL
in the same variable
column, and values for both of them in the value
column. But setting color = 'variable'
lets plotly.express know how to categorize the variable
column, correctly separate the data in in the value
column, and thus correctly produce the trendlines. A different name in the variable
column means that index
and value
in the same row belongs to different categories, for which a new trace and trendline are built.
Any pros and cons?
The arguably only advantage with the wide format is that it's easier to read (particularly for those of us damaged by too many years of sub-excellent data handling with Excel). And one great advantage with the long format is that you can easily illustrate more dimensions of the data if you have more categories with, for example, different symbols or sizes for the markers.
Another advantage with the long format occurs if the dataset changes, for example with the addition of another variable 'AMZN'
. Then the name and the values of that variable will occur in the already existing columns instead of adding another one like you would for the wide format. This means that you actually won't have to change the code in:
fig_long = px.scatter(df_long, x= 'index', y = 'value',
color = 'variable',
trendline = 'ols')
... in order to add the data to the figure.
While for the wide format, you would have to specify y = ['GOOG', 'AAPL', 'AMZN']
in:
fig_wide = px.scatter(df_wide, x = 'index', y = ['AAPL', 'MSFT', 'AMZN'],
trendline = 'ols',
)
And I would strongly argue that this outweighs the slight inconvenience of speifying color = 'variable'
in:
fig_long = px.scatter(df_long, x= 'index', y = 'value',
color = 'variable',
trendline = 'ols')
Plot 2 - A new variable:

Complete code
# imports
import pandas as pd
import plotly.express as px
# data
df = px.data.stocks()
# df.date = pd.to_datetime(df.date)
df_wide = df.drop(['date', 'GOOG', 'AMZN', 'NFLX', 'FB'], axis = 1).reset_index()
# df_wide = df.drop(['date', 'GOOG', 'NFLX', 'FB'], axis = 1).reset_index()
df_long = pd.melt(df_wide, id_vars = 'index')
df_long
fig_wide = px.scatter(df_wide, x = 'index', y = ['AAPL', 'MSFT'],
trendline = 'ols',
)
fig_long = px.scatter(df_long, x= 'index', y = 'value',
color = 'variable',
trendline = 'ols')
# fig_long.show()
fig_wide.show()