How to efficiently calculate only the upper triangle of this operation in Python?

Question

I am doing a calculation that measures the difference between values in a pd.Series. Although it is a vector operation and done in one go, I feel that it is inefficient in that it also calculates the values for the lower triangle as well as the upper (essentially the values * -1). I want the upper triangle only.

How can I only calculate values the upper triangle (not index them post hoc)?

I can convert the pandas to numpy if it will speed up the operation significantly.

profile = np.log(pd.Series({'Attr000001': 17511, 'Attr000002': 4, 'Attr000003': 8078, 'Attr000004': 1, 'Attr000005': 1716}))
idx_attrs = profile.index

d_ratio = dict()
for j,id_attr in enumerate(idx_attrs):
    d_ratio[id_attr] = (profile[id_attr] - profile).to_dict()
df_ratio = pd.DataFrame(d_ratio).T
# print(df_ratio)
#             Attr000001  Attr000002  Attr000003  Attr000004  Attr000005
# Attr000001    0.000000    8.384290    0.773685    9.770585    2.322833
# Attr000002   -8.384290    0.000000   -7.610605    1.386294   -6.061457
# Attr000003   -0.773685    7.610605    0.000000    8.996900    1.549148
# Attr000004   -9.770585   -1.386294   -8.996900    0.000000   -7.447751
# Attr000005   -2.322833    6.061457   -1.549148    7.447751    0.000000

With `numpy` building blocks it can be more expensive to select the upper triangle than to calculate the redunant. That said, `scipy.spatial.distance` has a `pdist` that calculates, what it calls, condensed distance matrices (and convert to/from full square form). I haven't looked at how it works or how much time it saves. — hpaulj, Jul 03 '18 at 19:44

score 4 · Accepted Answer · answered Jul 03 '18 at 19:48

Avoid the Python for loop. In numpy this is just:

>>> profile[:, None] - profile[None, :]
array([[ 0.        ,  8.38429017,  0.77368494,  9.77058453,  2.32283325],
       [-8.38429017,  0.        , -7.61060524,  1.38629436, -6.06145692],
       [-0.77368494,  7.61060524,  0.        ,  8.9968996 ,  1.54914832],
       [-9.77058453, -1.38629436, -8.9968996 ,  0.        , -7.44775128],
       [-2.32283325,  6.06145692, -1.54914832,  7.44775128,  0.        ]])

`np.triu(profile[:,None] - profile[None,:])` to display on upper triangle. — Scott Boston, Jul 03 '18 at 19:54

How to efficiently calculate only the upper triangle of this operation in Python?

1 Answers1