Is there a python code to rearrange values within data "blocks"?

Question

I have a pandas dataframe with data in 3 columns that look like this:

Column 1	Column 2	Column 3
m26:1_8.6		m26:1_8.6
FA8:0	329448.0313	FA18:1
FA18:1	4048785.5	FA12:0
FA10:0	2232634.75	FA10:0
FA12:0	2541154.25	FA8:0
m26:0_8.8		m26:0_8.8
FA8:0	2084220.75	FA10:0
FA10:0	5801817	FA12:0
FA2:0	158898.5625	FA8:0
FA12:0	3270664.25	FA2:0

Each "block" of data is separated by a row containing 'm', or a blank in the second column. The values in column 2 are "tagged" to column 1. I need to rearrange columns 1 and 2 according to the order of column 3, i.e., column 1 equals column 3 and the corresponding data in column 2 is also rearranged with the same positions as column 1.

The desired output is as follows:

Column 1	Column 2	Column 3
m26:1_8.6		m26:1_8.6
FA18:1	4048785.5	FA18:1
FA12:0	2541154.25	FA12:0
FA10:0	2232634.75	FA10:0
FA8:0	329448.0313	FA8:0
m26:0_8.8		m26:0_8.8
FA10:0	5801817	FA10:0
FA12:0	3270664.25	FA12:0
FA8:0	2084220.75	FA8:0
FA2:0	158898.5625	FA2:0

I tried multiple codes but this was my latest attempt:

# Loop through the original dataframe row by row
for i, row in df.iterrows():
    # Check if it is a header row
    if pd.isna(row['Column 2']):
        # Add the header row to the new dataframe
        new_df = new_df.append(row)
    else:
        # Extract the values from columns 1, 2, and 3
        label = row['Column 1']
        value = row['Column 2']
        position = row['Column 3']
        # Find the index of the label in column 3
        index = new_df.index[new_df['Column 1'] == position][0]
        # Insert a new row in the new dataframe with the values from the original row
        new_row = pd.DataFrame({
            'Column 3': [position],
            'Column 1': [label],
            'Column 2': [value]
        }, index=[index])
        new_df = pd.concat([new_df.iloc[:index], new_row, new_df.iloc[index:]])

print(new_df)

However, it gave me the error IndexError: index 0 is out of bounds for axis 0 with size 0

Could I seek some help to resolve this? Thank you!

Show the full traceback of the error as properly formatted text (formatted as code) in the question. — Michael Butscher, Apr 30 '23 at 09:54
@Corralien yes the values are unique within each group, but might be duplicated across groups — Megan Lee, May 02 '23 at 01:41

score 0 · Accepted Answer · answered Apr 30 '23 at 10:19

For each group, assuming Column 1 values are unique:

def sort_group(df):
    x = df['Column 1'].values
    y = df['Column 3'].values
    xsorted = np.argsort(x)
    ypos = np.searchsorted(x[xsorted], y)
    indices = xsorted[ypos]
    return df.drop(columns='Column 3').iloc[indices]

# https://stackoverflow.com/a/8251668/15239951
out = (df.groupby(df['Column 1'].str.startswith('m').cumsum())
         .apply(sort_group).droplevel(0))

Output:

>>> out
    Column 1      Column 2
0  m26:1_8.6           NaN
2     FA18:1  4.048786e+06
4     FA12:0  2.541154e+06
3     FA10:0  2.232635e+06
1      FA8:0  3.294480e+05
5  m26:0_8.8           NaN
7     FA10:0  5.801817e+06
9     FA12:0  3.270664e+06
6      FA8:0  2.084221e+06
8      FA2:0  1.588986e+05

Timeless · Answer 2 · 2023-04-30T12:55:47.520

Another option is to use a self merge after de-duplicating each bloc :

grp = ((df["Column 1"].str.startswith("m")) | (df["Column 2"].isnull())).cumsum()

out = (df.merge(df.drop(columns=["Column 1", "Column 2"]),
                left_on=[grp, "Column 1"], right_on=[grp, "Column 3"],
                suffixes=("_", ""))[df.columns]
      )

Output :

print(out)

    Column 1   Column 2   Column 3
0  m26:1_8.6        NaN  m26:1_8.6
1      FA8:0  329448.03      FA8:0
2     FA18:1 4048785.50     FA18:1
3     FA10:0 2232634.75     FA10:0
4     FA12:0 2541154.25     FA12:0
5  m26:0_8.8        NaN  m26:0_8.8
6      FA8:0 2084220.75      FA8:0
7     FA10:0 5801817.00     FA10:0
8      FA2:0  158898.56      FA2:0
9     FA12:0 3270664.25     FA12:0

**NB :** Your transformation would *make more sense* if you align `Column 2` as well. — Timeless, Apr 30 '23 at 11:05

Is there a python code to rearrange values within data "blocks"?

2 Answers2