5

I want to transfer SAS code to python, and cannot find a retain function in python.

The data is like :

type_id    amount
1           100
1           200
1           400
2           0
1           200
1           300
2           0
1           150

What I want is when type_id = 2, the amount is equal to the negative value of the previous row. So the data will be like this after retain function:

type_id    amount
1           100
1           200
1           400
2          -400
1           200
1           300
2          -300
1           150

The SAS code is :

data B;
set A;
retain tempvar 0;
if type_id = 2
then amount = tempvar;
else tempvar = -amount;
drop tempvar;
run;

Does anyone have any idea about how to do this in python? Thanks!

juanbin
  • 83
  • 1
  • 9
  • 2
    What if there are two type_id=2s in a row (i.e. consecutive)? Or is that not possible? – DSM Apr 14 '17 at 16:38
  • Not very knowledgeable in Pandas but I'm curious as to why you would want negative results. – roganjosh Apr 14 '17 at 16:41
  • It is not possible in this case to have two type_id = 2 continuously. – juanbin Apr 14 '17 at 18:30
  • I want negative results is because, for example, a customer give us a check so I get $400 in my database, but I find the check is bad some days later, I can't get money from it. So I want another -$400 in my database to fix this problem~ – juanbin Apr 14 '17 at 18:33

2 Answers2

3

IIUC

df
type_id amount
0   1   100
1   1   200
2   1   400
3   2   0
4   1   200
5   1   300
6   2   0
7   1   150

def retain(df):
    df['ret'] = df['amount'].shift()
    df.ix[df['type_id']==2,'amount'] = -df.ix[df['type_id']==2,'ret']
    df.drop("ret", axis=1, inplace=True)
    return df

retain(df)
type_id amount
0   1   100.0
1   1   200.0
2   1   400.0
3   2   -400.0
4   1   200.0
5   1   300.0
6   2   -300.0
7   1   150.0

Alternatively:

def retain(df):
    df.amount.ix[df.type_id==2] = - df.amount.shift().ix[df.type_id==2]
    return df

retain(df)
type_id amount
0   1   100.0
1   1   200.0
2   1   400.0
3   2   -400.0
4   1   200.0
5   1   300.0
6   2   -300.0
7   1   150.0
Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72
1
# another idea to try 
class retain_memo(object):
    def __init__(self):
        self.value = None
        
    def set_get(self, u):
        self.value = u 
        return u

def retain(series):
    memo = retain_memo()
    return pd.Series([memo.set_get(u) if pd.notnull(u) else memo.value for u in series])


test_series = pd.Series([1,1,2,None,None,4,4,None,None,None,None])
sas_style_retained = retain(test_series)

pd.DataFrame({'original':test_series,'retained':sas_style_retained})
# original  retained
# 0     1.0     1.0
# 1     1.0     1.0
# 2     2.0     2.0
# 3     NaN     2.0
# 4     NaN     2.0
# 5     4.0     4.0
# 6     4.0     4.0
# 7     NaN     4.0
# 8     NaN     4.0
# 9     NaN     4.0
# 10    NaN     4.0

# use something like df.groupby(<id>)[var].transform(retain) 
# for grouped data
  • Pandas actually has a built-in series function named ffill(). So instead of retain(...) stuff above we could simply use series.ffill(). In (non-scientifically) experimenting with ffill() it seems faster for lists longer than 50 elements. – Jerry Watkins Sep 09 '22 at 01:12
  • ffill(...) is especially faster when used with grouped data. – Jerry Watkins Sep 09 '22 at 01:24