0

Sorry for the title which is maybe more complicated than the problem itself ;)

I have de following pandas dataframe

    grh  anc     anc1     anc2    anc3     anc4     anc5    anc6     anc7  
1     2    5  0.10000  0.12000  0.1800  0.14000  0.15000  0.1900  0.20000   
2     3    7  0.03299  0.05081  0.0355  0.02884  0.03054  0.0332  0.03115   
3     4    3  0.00000  0.00000  0.0000  0.00000  0.00000  0.0000  0.00000   
4     5    4  0.00000  0.00000  0.0000  0.00000  0.00000  0.0000  0.00000   
5     6    1  0.10000  0.10000  0.1000  0.10000  0.10000  0.1000  0.10000   


       anc8     anc9    anc10  
1   0.10000  0.21000  0.24000  
2   0.02177  0.04903  0.04399  
3   0.00000  0.00000  0.00000  
4   0.00000  0.00000  0.00000  
5   0.10000  0.10000  0.10000  

I would like to add new columns with a forloop lap1, lap2, ....depending on the values of variable anc. For instance, on the first row, anc=5 so lap1 should be equal to the value of anc5 (0.1500), lap2 equal to anc6 (0.1900)...on the second row lap1=anc7 (0.03115), lap2=anc8 (0.02177),...

So, the output should look like

grh anc anc1    anc2    anc3    anc4    anc5    anc6    anc7    anc8    anc9    anc10   lap1    lap2    lap3
2   5   0.10000 0.12000 0.18000 0.14000 0.15000 0.19000 0.20000 0.1000  0.21000 0.24000 0.15000 0.19000 0.20000
3   7   0.03299 0.05081 0.0355  0.02884 0.03054 0.0332  0.03115 0.02177 0.04903 0.04399 0.03115 0.02177 0.04903
4   3   0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
5   4   0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
6   1   0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000

I've tried something very basic, but doesn't seem to work

for i in range(1,4):
    j=df['anc']+i
    df['lap'+str(i)]= df['anc'+str(j)]

I would be very grateful if you have any idea. Thks

Bebio
  • 409
  • 5
  • 12
  • could you provide a sample output of what you want given this dataframe please as detailed here [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) – Haleemur Ali Apr 26 '20 at 15:24
  • 1
    Thanks for your point. I've added the output that i want to obtain. – Bebio Apr 26 '20 at 16:52

3 Answers3

1

set grh & anc as index as we are looking to index into the anc[1-9] columns. This also comes in handy when we write the output columns:

df2 = df.set_index(['grh', 'anc']) 

for each row slice into the columns using the anc value, which is now in the index, taking the 3 adjacent values, convert them to a series with names as you expect in the output and assign them to matching output columns

outcols = ['lap1', 'lap2', 'lap3']
df2[outcols] = df2.apply(lambda x: pd.Series(x[x.name[1]-1:x.name[1]+2].values, index=outcols), axis=1)

df2 looks like this:

            anc1     anc2    anc3     anc4     anc5    anc6     anc7     anc8     anc9    anc10     lap1     lap2     lap3
grh anc
2   5    0.10000  0.12000  0.1800  0.14000  0.15000  0.1900  0.20000  0.10000  0.21000  0.24000  0.15000  0.19000  0.20000
3   7    0.03299  0.05081  0.0355  0.02884  0.03054  0.0332  0.03115  0.02177  0.04903  0.04399  0.03115  0.02177  0.04903
4   3    0.00000  0.00000  0.0000  0.00000  0.00000  0.0000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
5   4    0.00000  0.00000  0.0000  0.00000  0.00000  0.0000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
6   1    0.10000  0.10000  0.1000  0.10000  0.10000  0.1000  0.10000  0.10000  0.10000  0.10000  0.10000  0.10000  0.10000

reset the index again if you like to revert grh & anc back to being columns.


alternative name based look-up instead of positional lookup:

define a utility function to perform the column lookup provided an float. It needs to accept a float because pandas would automatically upcast an int64 to a float64 if the series contains any non-integer values. Use this function to perform lookup & to assign the output. The one benefit of this approach is that no set_index is required.

def cols(n,p): return [f'{p}{i}' for i in range(int(n), int(n+3))] 
df[cols(1, 'lap')] = df.apply(lambda x: pd.Series(x[cols(x.anc, 'anc')].values), axis=1)
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
  • Mitigates the issues OP had with my answer, yet what if columns are unordered... – Grzegorz Skibinski Apr 26 '20 at 19:06
  • @GrzegorzSkibinski, then OP can sort their columns before applying my procedure – Haleemur Ali Apr 26 '20 at 19:15
  • Thanks for your detailed answer which is very useful to me. However, i still have the following error : ValueError: ('Length of passed values is 1, index implies 3', 'occurred at index (4, 10)') – Bebio Apr 26 '20 at 19:23
  • @Bebio, do you get this error on the sample that you shared or your full dataset? – Haleemur Ali Apr 26 '20 at 19:30
  • @Haleemur Ali - correct which would defeat the purpose of calling by order Vs calling by name :) – Grzegorz Skibinski Apr 26 '20 at 19:39
  • @HaleemurAli, you're completly right. I was on a non consistent sample. Your code works fine on my full dataset. Thanks again for your time. – Bebio Apr 26 '20 at 19:39
  • @GrzegorzSkibinski, I'm unclear on what your point is. Anyhow, see my updated answer for a name-lookup based approach. BTW, your provided attempt has a bug, and would work if `anc{j+x['anc']-1}` is replaced by `anc{int(x['anc']+j-1)}` – Haleemur Ali Apr 26 '20 at 21:03
  • My point was, that positional lookup is a risky approach here, for exactly these reasons. Calling by reference works way better. Cheers, for checking my solution- my test case wasn't representative, I think that's where it came from ;) – Grzegorz Skibinski Apr 27 '20 at 04:46
0

A bit of a 'brute-force' approach, but I can't see how you can do this otherwise:

df[[f"lap{i}" for i in range(1,4)]]= \
    df.apply(lambda x: \
        pd.Series({f"lap{j}": x[f"anc{int(j+x['anc']-1)}"] for j in range(1,4)}) \
    , axis=1)

(Assuming per your sample, that you have max lap at 3)

Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
0
# Where is the new lap column starting
startingNewColsNumber  = df.shape[1]

# How many new lap columns to add
numNewCols = df.grh.max() 

# Generate new lap columns
newColNames = ['lap'+str(x) for x in range(1, numNewCols + 1)]

# add new lap columns to the dataframe
for lapName in newColNames:
    df[lapName] = np.NaN

# now fill the values for each of rows for the new 'lap' columns 
for row in df.index:
    startCopyCol = df.loc[row,'anc'] + 1   # What is the begening anc value to start copying
    howmany = df.loc[row,'grh']            # How many lap values should I fill
    df.iloc[row, startingNewColsNumber : startingNewColsNumber + howmany]  =  \
    df.iloc[row, startCopyCol : startCopyCol + howmany].values

df 

Here is the output I got:

grh anc anc1    anc2    anc3    anc4    anc5    anc6    anc7    anc8    anc9    anc10   lap1    lap2    lap3    lap4    lap5    lap6
0   2   5   0.10000 0.12000 0.1800  0.14000 0.15000 0.1900  0.20000 0.10000 0.21000 0.24000 0.15000 0.19000 NaN NaN NaN NaN
1   3   7   0.03299 0.05081 0.0355  0.02884 0.03054 0.0332  0.03115 0.02177 0.04903 0.04399 0.03115 0.02177 0.04903 NaN NaN NaN
2   4   3   0.00000 0.00000 0.0000  0.00000 0.00000 0.0000  0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.0 NaN NaN
3   5   4   0.00000 0.00000 0.0000  0.00000 0.00000 0.0000  0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.0 0.0 NaN
4   6   1   0.10000 0.10000 0.1000  0.10000 0.10000 0.1000  0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.1 0.1 0.1

Let me know if this gives some kind of solution for you are looking

theDBA
  • 239
  • 1
  • 5
  • Hi, and thank you for your proposal. It doesn't really match my problem. The number of new lap columns to add is not related to the grh number. Its equal to 3 in the example. – Bebio Apr 27 '20 at 12:34
  • Got it. I did not see that the number of new lap columns are fixed and they are 3. With this new info making just two changes will fix the code. The changes are : # numNewCols = df.grh.max() ## Commented code numNewCols = 3 ## New code # howmany = df.loc[row,'grh'] # How many lap values should I fill ## Commented code howmany = 3 # How many lap values should I fill ## New code – theDBA Apr 27 '20 at 13:52