It is a follow-up question to my previous question: I have a dataframe like this
Company_id year dummy_1 dummy_2 dummy_3 dummy_4 dummy_5
1 1990 1 0 1 1 1
1 1991 0 0 1 1 0
1 1992 0 0 1 1 0
1 1993 1 0 1 1 0
1 1994 0 1 1 1 0
1 1995 0 0 1 1 0
1 1996 0 0 1 1 1
I created an numpy array by:
df = df.assign(vector = df.iloc[:, -5:].values.tolist())
df['vector'] = df['vector'].apply(np.array)
I want to compare company's distinctivness in terms of it's strategic practices compared to rivals in last 5 years. Here is the code that I use:
df.sort_values('year', ascending=False)
# These will be our lists of differences.
diffs = []
# Loop over all unique dates
for date in df.year.unique():
# Only take dates earlier then current date.
compare_df = df.loc[df.year - date <= 5 ].copy()
# Loop over each company for this date
for row in df.loc[df.year == date].itertuples():
# If no data available use nans.
if compare_df.empty:
diffs.append(float('nan'))
# Calculate cosine and fill in otherwise
else:
compare_df['distinctivness'] = spatial.distance.cosine(np.array(compare_df.vector) , np.array(row.vector))
row_of_interest = compare_df.distinctivness.mean()
diffs.append(row_of_interest.distinctivness.values[0])
However, I get
compare_df['distinctivness'] = spatial.distance.cosine(np.array(compare_df.vector) - np.array(row.vector))
ValueError: operands could not be broadcast together with shapes (29254,) (93,)
How could I get rid of this problem?