I have a dataframe where some columns are name-paired (for each column ending with _x
there is a corresponding column ending with _y
) and others are not. For example:
import pandas as pd
import numpy as np
colnames = [
'foo', 'bar', 'baz',
'a_x', 'b_x', 'c_x',
'a_y', 'b_y', 'c_y',
]
rng = np.random.default_rng(0)
data = rng.random((20, len(colnames)))
df = pd.DataFrame(data, columns=colnames)
Assume I have two lists containing all the column names ending with _x
, and all the column names ending with _y
(it's easy to build such lists), of the same length m
(remember that for each _x
column there is one and only one corresponding _y
column). I want to create m
new columns with a simple formula:
df['a_err'] = (df['a_x'] - df['a_y']) / df['a_y']
without hard-coding the column names, of course. It's easy to do so with a for
loop, but I would like to know if it's possible to do the same without a loop, in the hope that it would be faster (the real dataframe is way bigger than this small example).