I have a DataFrame that looks like this
df = pd.DataFrame([
['a', 1],
['b', 1],
['c', 1],
['a', 2],
['c', 3],
['b', 4],
['c', 4]
], columns=['item', 'user'])
Where each user is repeated across multiple rows (with different items).
I would like to perform a LabelEncoder/LabelBinarizer like transform (??) to convert the DataFrame into something that looks like this:
pd.DataFrame([
[1, 1, 1], #user 1
[1, 0, 0], #user 2
[0, 0, 1], #user 3
[0, 1, 1] #user 4
], columns=['a', 'b', 'c'])
I likely don't want to use pandas (pivot
, get_dummies
, crosstab
), because I want to pass a new user to the transformer:
new_user = pd.DataFrame([
['c', 5],
['d', 5]
], columns=['item', 'user'])
And get back something like this:
[0, 0, 1]
Important: solution must solve for the new user case (and dropped 'd' item), and preserve column order as well as dimensions