0

Here is a code:

# from: https://stackoverflow.com/questions/60101845/compare-multiple-pandas-columns-1st-and-2nd-after-3rd-and-4rth-after-etc-wit
# from: https://stackoverflow.com/questions/27474921/compare-two-columns-using-pandas?answertab=oldest#tab-top
# from: https://stackoverflow.com/questions/60099141/negation-in-np-select-condition
import pandas as pd
import numpy as np

df = pd.DataFrame({ 'var1': ['a', 'b', 'c',np.nan, np.nan],
                   'var2': [1, 2, np.nan , 4, np.nan], 
                   'var3': [np.nan , "x", np.nan, "y", "z"],
                   'var4': [np.nan , 4, np.nan, 5, 6],
                   'var5': ["a", np.nan , "b", np.nan, "c"],
                   'var6': [1, np.nan , 2, np.nan, 3]
                 })



col1 = ["var1", "var3", "var5"]
col2 = ["var2", "var4", "var6"]
colR = ["Result1", "Result2", "Result3"]

s1 = df[col1].isnull().to_numpy()
s2 = df[col2].isnull().to_numpy()

conditions = [~s1 & ~s2, s1 & s2, ~s1 & s2, s1 & ~s2]
choices = ["Both values", np.nan, df[col1], df[col2]]

df = pd.concat([df, pd.DataFrame(np.select(conditions, choices), columns=colR, index=df.index)], axis=1)

The result ( df ) looks like:

  var1  var2 var3  var4 var5  var6      Result1      Result2      Result3
0    a   1.0  NaN   NaN    a   1.0  Both values          NaN  Both values
1    b   2.0    x   4.0  NaN   NaN  Both values  Both values          NaN
2    c   NaN  NaN   NaN    b   2.0            c          NaN  Both values
3  NaN   4.0    y   5.0  NaN   NaN            4  Both values          NaN
4  NaN   NaN    z   6.0    c   3.0          NaN  Both values  Both values

It works but have problems with missing values in choices (more about it here, but it is not that important to my current question). Now I need to use instead of np.select(conditions, choices) code something like this (the idea is to use pandas instead of numpy in order to avoid problems described in a link above):

pd.DataFrame(choices).where(conditions).ffill().fillna(0).iloc[-1]

or this:

pd.DataFrame(choices).where(conditions,0).sum()

If I just replace the code part I am getting error:

runfile('D:/del/untitlejhgd0.py', wdir='D:/del')
Traceback (most recent call last):

  File "<ipython-input-16-10cdf307d77c>", line 1, in <module>
    runfile('D:/del/untitlejhgd0.py', wdir='D:/del')

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/del/untitlejhgd0.py", line 29, in <module>
    df = pd.concat([df, pd.DataFrame((pd.DataFrame(conditions).where(choices).ffill().fillna(0).iloc[-1]), columns=colR, index=df.index)], axis=1)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 488, in __init__
    mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 169, in init_ndarray
    values = prep_ndarray(values, copy=copy)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 295, in prep_ndarray
    raise ValueError("Must pass 2-d input")

ValueError: Must pass 2-d input

Question: How to replace code parts above in order to make code work?

vasili111
  • 6,032
  • 10
  • 50
  • 80

0 Answers0