1

It is a follow-up question to my previous question: I have a dataframe like this

Company_id  year  dummy_1 dummy_2 dummy_3 dummy_4 dummy_5
1           1990   1       0        1        1      1
1           1991   0       0        1        1      0
1           1992   0       0        1        1      0
1           1993   1       0        1        1      0
1           1994   0       1        1        1      0
1           1995   0       0        1        1      0
1           1996   0       0        1        1      1

I created an numpy array by:

df = df.assign(vector = df.iloc[:, -5:].values.tolist())
df['vector'] = df['vector'].apply(np.array)

I want to compare company's distinctivness in terms of it's strategic practices compared to rivals in last 5 years. Here is the code that I use:

df.sort_values('year', ascending=False)



# These will be our lists of differences.
diffs = []

# Loop over all unique dates
for date in df.year.unique():
    # Only take dates earlier then current date.
    compare_df = df.loc[df.year - date <= 5 ].copy()
    # Loop over each company for this date
    for row in df.loc[df.year == date].itertuples():
        # If no data available use nans.
        if compare_df.empty:
            diffs.append(float('nan'))
        # Calculate cosine and fill in otherwise
        else:
            compare_df['distinctivness'] = spatial.distance.cosine(np.array(compare_df.vector) , np.array(row.vector))
            row_of_interest = compare_df.distinctivness.mean()
            diffs.append(row_of_interest.distinctivness.values[0])

However, I get

    compare_df['distinctivness'] = spatial.distance.cosine(np.array(compare_df.vector) - np.array(row.vector))

ValueError: operands could not be broadcast together with shapes (29254,) (93,) 

How could I get rid of this problem?

Dogukan Yılmaz
  • 556
  • 2
  • 15
  • which line triggers the error? – Yuca Aug 30 '18 at 14:36
  • 1
    check the lengths of `compare_df.vector` and `row.vector` if they're not the same length then you cant use `spatial.distance.cosine` – DrBwts Aug 30 '18 at 14:38
  • There is no NaN value in vectors, and they have the same size, as I created vectors manually – Dogukan Yılmaz Aug 30 '18 at 14:40
  • not sure what you want to achieve here, but if you want to apply `spatial,distance.cosine` to *each* row of `compare_df`, you have to change that line to: `compare_df['distinctivness'] = compare_df.apply(lambda t, r: spatial.distance.cosine(t.vector, r), axis=1, r=row.vector)` – fernandezcuesta Aug 30 '18 at 15:15

0 Answers0