0
import patsy
from patsy import dmatrices, dmatrix, demo_data
dt=pd.DataFrame({'F1':['a','b','c','d','e','a'],'F2':['X','X','Y','Y','Z','Z']})

I know I can do this

dmatrix("1+I(F1=='a')",dt)

but can I create a arbitrary function patsy? I'm trying to mimicing same level flexibility in formula language in R, but it seems not straight forward to achieve in python

def abd(x):
    1 if x in ['a','b','d'] else 0

dmatrix("1+abd(F1)",dt)
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
xappppp
  • 481
  • 7
  • 18

2 Answers2

1

IIUC

def abd(x):
    return x.isin(['a','b','d'])
dmatrix("1+abd(F1)",dt)
Out[182]: 
DesignMatrix with shape (6, 2)
  Intercept  abd(F1)[T.True]
          1                1
          1                1
          1                0
          1                1
          1                0
          1                1
  Terms:
    'Intercept' (column 0)
    'abd(F1)' (column 1)
BENY
  • 317,841
  • 20
  • 164
  • 234
  • just to add a comment to this answer, I think patsy is able to handle a custom function as long as it is defined being vectorized and is able to handle series. – xappppp Jun 12 '18 at 12:54
  • @xappppp Right, there's no such thing as a "patsy function", but you can define a python function and use that. The patsy variables refer to columns of your data, so if your data is a DataFrame they'll be Series objects. You do need to make sure your functions can handle whole vectors/Series at a time. The problem with the original abd is that the python if/else operator only handles scalars, which is a more general R-vs-Python thing, not specific to patsy. – Nathaniel J. Smith Jun 12 '18 at 15:35
0

I test more closer mimicking to what has established in R formula system. Below is a simpler representation of the accepted answer. Python by design provide such flexibility. R surely can do the same (custom function) but it is easier to be ignored.

import pandas as pd
from patsy import dmatrices, dmatrix, demo_data

dt=pd.DataFrame({'F1':['a','b','c','d','e','a'],'F2':['X','X','Y','Y','Z','Z']})
def xx(x,y):return(np.isin(x,list(y))*1)
dmatrix("1+xx(F1,['a','b'])",dt)

DesignMatrix with shape (6, 2)
  Intercept  xx(F1, ['a', 'b'])
      1                   1
      1                   1
      1                   0
      1                   0
      1                   0
      1                   1
  Terms:
    'Intercept' (column 0)
    "xx(F1, ['a', 'b'])" (column 1)
xappppp
  • 481
  • 7
  • 18