I have a pandas dataframe with data in 3 columns that look like this:
Column 1 | Column 2 | Column 3 |
---|---|---|
m26:1_8.6 | m26:1_8.6 | |
FA8:0 | 329448.0313 | FA18:1 |
FA18:1 | 4048785.5 | FA12:0 |
FA10:0 | 2232634.75 | FA10:0 |
FA12:0 | 2541154.25 | FA8:0 |
m26:0_8.8 | m26:0_8.8 | |
FA8:0 | 2084220.75 | FA10:0 |
FA10:0 | 5801817 | FA12:0 |
FA2:0 | 158898.5625 | FA8:0 |
FA12:0 | 3270664.25 | FA2:0 |
Each "block" of data is separated by a row containing 'm', or a blank in the second column. The values in column 2 are "tagged" to column 1. I need to rearrange columns 1 and 2 according to the order of column 3, i.e., column 1 equals column 3 and the corresponding data in column 2 is also rearranged with the same positions as column 1.
The desired output is as follows:
Column 1 | Column 2 | Column 3 |
---|---|---|
m26:1_8.6 | m26:1_8.6 | |
FA18:1 | 4048785.5 | FA18:1 |
FA12:0 | 2541154.25 | FA12:0 |
FA10:0 | 2232634.75 | FA10:0 |
FA8:0 | 329448.0313 | FA8:0 |
m26:0_8.8 | m26:0_8.8 | |
FA10:0 | 5801817 | FA10:0 |
FA12:0 | 3270664.25 | FA12:0 |
FA8:0 | 2084220.75 | FA8:0 |
FA2:0 | 158898.5625 | FA2:0 |
I tried multiple codes but this was my latest attempt:
# Loop through the original dataframe row by row
for i, row in df.iterrows():
# Check if it is a header row
if pd.isna(row['Column 2']):
# Add the header row to the new dataframe
new_df = new_df.append(row)
else:
# Extract the values from columns 1, 2, and 3
label = row['Column 1']
value = row['Column 2']
position = row['Column 3']
# Find the index of the label in column 3
index = new_df.index[new_df['Column 1'] == position][0]
# Insert a new row in the new dataframe with the values from the original row
new_row = pd.DataFrame({
'Column 3': [position],
'Column 1': [label],
'Column 2': [value]
}, index=[index])
new_df = pd.concat([new_df.iloc[:index], new_row, new_df.iloc[index:]])
print(new_df)
However, it gave me the error IndexError: index 0 is out of bounds for axis 0 with size 0
Could I seek some help to resolve this? Thank you!