1

Suppose we have the following two Series:

s1 = pd.Series([1, 1, 1], index=['B', 'A', 'C'])
s2 = pd.Series([2, 3, 1], index=['B', 'C', 'A'])

Note that the index of s1 and s2 are the same three labels but in different orders.

Multiplying the two Series s1 * s2 gives:

A    1
B    2
C    3

where the resulting index is sorted alphabetically.

This comes as a surprise to me because I would expect to get:

B    2
A    1
C    3

preserving the index order of the multiplicand s1 (or maybe that of the multiplier s2).

I get similar results using DataFrames as well as other arithmetic operations (+, -, and /) and their corresponding methods, like .mul().

I figured that reindexing s2 is an easy solution, for example:

s1 * s2.reindex_like(s1)

Nonetheless, sorting misaligned indices in alphabetic order seems arbitrary to me. I'm not able to find a good description of this pandas behaviour online, can somebody explain?

gergoing
  • 36
  • 6
  • 1
    Pandas matches the values based on the indexes, not position, when doing the multiplication. if you don't want that behavior you should reset the indexes, or just do plain multiplication of the values arrays, not the two series. For example: `s1.values.mul(s2.values)` or even `s1 * s2.values` (as in comments to [this](https://stackoverflow.com/questions/31708959/multiply-two-pandas-series-with-mismatched-indices) question) – topsail Aug 06 '23 at 18:29
  • *Nonetheless, sorting misaligned indices in alphabetic order seems arbitrary to me* Note that is it probably a mistake to think of indexes as having "order". They are labels for the values. They don't imply an order, nor as far as I know is pandas required to maintain any order in an index. For what it's worth, if the indexes can be ignored when multiplying the two series, it may be that they are not really necessary after all, since they don't seem to have any real meaning as far as labeling the values goes. – topsail Aug 06 '23 at 18:48
  • @topsail I don't think your two comments are related what OP's asking. OP is aware of index alignment; but is asking why s1*s2 has lexicographical index instead of s1.index. – Quang Hoang Aug 06 '23 at 21:51
  • @gergoing I'm not a dev, but it seems like building a binary tree from s1's index and then insert s2's index seems to be natural. In which case, the resulted series is sorted by the order of index dtype. **Note**: this works for general case, for example, s1 is indexed by BACD and s2 is indexed by BECA -> s1*s2 is expected to have index ABCDE. – Quang Hoang Aug 06 '23 at 21:58
  • @QuangHoang Thanks, that's a perfect rephrasing of my question. I also see your point about the 'general' case. I guess I just need to beware that lexicographical ordering is the default when there is any mismatch of indexes, and manually adjust if necessary. – gergoing Aug 07 '23 at 07:52
  • Well I still think its a mistake to rely on ordering - the pandas documentation on [multiply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mul.html) makes no guarantees about order (and doesn't even mention ordering). It may be natural to expect an order, but that is not the same thing as a guarantee of an order. However, it is possible the documentation is incomplete (or let's say this could be a "reliable" side effect of the implementation) ... so perhaps I'm wrong and its fine to expect an order (as long as you expect the right one). – topsail Aug 07 '23 at 12:53

0 Answers0