In the context of trying to plot the YoY correlation of a DataFrame in Python. The question is how does one get the 3 pair-wise correlation coefficients representing each pair of the variables "AAPL", "IBM" and "MSFT" correlation each year. Then plot them with matplotlib.
How does one calculate a correlation by row? .corrwith
seems to be whats suggested but it it not working here.
https://www.geeksforgeeks.org/python-pandas-dataframe-corrwith/
I managed to get to a pandas DataFrame where each row represents the year and each element represents the cumulative price over the year. I would like to take the correlations of the cumulative YoY prices then plot them as a function of time.
The data looks like:
AAPL IBM MSFT
Year
2003 333.392142 21429.009979 6585.475002
2004 637.586428 22862.419960 6837.309986
2005 1678.695713 21121.199997 6519.779993
2006 2545.412858 20827.630028 6592.800003
2007 4603.665710 26528.350021 7638.409990
2008 5143.625731 27841.030014 6755.059990
2009 5278.287136 27444.059998 5779.759998
2010 9312.338573 33034.919891 6795.050001
The final plot is meant to look like this,
To summarize the question: How does one take the following data, calculate the 3 pairwise correlations for each year and then use matplotlib in order to plot the results?
The code to import the data and manipulate it so far is provided below. Note yfinance was used to load the data,
#!pip install yfinance
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
ticker_Symbol = "AAPL", "MSFT", "IBM"
start_date = '2003-1-01'
end_date = '2010-12-31'
df5 = yf.download(ticker_Symbol,start_date , end_date)
df = df5[["Open"]]
print(df.head(3))
# Index the Year of each Value
df["Year"] = df.index.year
dfYearly = df.groupby(['Year']).sum()
dfYearly = dfYearly["Open"]
dfYearly