I am doing a calculation that measures the difference between values in a pd.Series
. Although it is a vector operation and done in one go, I feel that it is inefficient in that it also calculates the values for the lower triangle as well as the upper (essentially the values * -1). I want the upper triangle only.
How can I only calculate values the upper triangle (not index them post hoc)?
I can convert the pandas
to numpy
if it will speed up the operation significantly.
profile = np.log(pd.Series({'Attr000001': 17511, 'Attr000002': 4, 'Attr000003': 8078, 'Attr000004': 1, 'Attr000005': 1716}))
idx_attrs = profile.index
d_ratio = dict()
for j,id_attr in enumerate(idx_attrs):
d_ratio[id_attr] = (profile[id_attr] - profile).to_dict()
df_ratio = pd.DataFrame(d_ratio).T
# print(df_ratio)
# Attr000001 Attr000002 Attr000003 Attr000004 Attr000005
# Attr000001 0.000000 8.384290 0.773685 9.770585 2.322833
# Attr000002 -8.384290 0.000000 -7.610605 1.386294 -6.061457
# Attr000003 -0.773685 7.610605 0.000000 8.996900 1.549148
# Attr000004 -9.770585 -1.386294 -8.996900 0.000000 -7.447751
# Attr000005 -2.322833 6.061457 -1.549148 7.447751 0.000000