Pandas convert Series of strings to Series of lists of strings (of size 1) for encoding

Question

I know the title is confusing, but let me explain. I'm trying to prepare Series' for a sklearn.MultiLableBinarizer, with each string being a separate user id I want to one-hot-encode. Erroneously, it is iterating over each individual character of the string. Doing series.apply(list) does the same thing, splitting each string into its individual characters. If the series goes like:

0 '3436803478'
1 '1230782212'
2 '7320482099'
...

Then I want the output to be

0 ['3436803478']
1 ['1230782212']
2 ['7320482099']
...

Instead of

0 ['3','4','3','6','8','0','3','4','7','8']
1 ['1','2','3','0','7','8','2','2','1','2']
2 ['7','3','2','0','4','8','2','0','9','9']
...

If I were working with a single value, I would just do ids = [[s] for s in values], but since we're working with Series and apply(), I need something like a function name, but for []. list() doesn't work, as explained here

Note: The strings actually start as integers, but I can get around that with .apply(str)

@SandeepKadapa I explained in the first paragraph that `s.apply(list)` splits into individual characters, which I don't want. — sawyermclane, Oct 06 '18 at 04:59
can you show up your actual data frame structure, though you got the job done. — Karn Kumar, Oct 06 '18 at 05:03

score 0 · Accepted Answer · answered Oct 06 '18 at 05:00

0

Chaining s.apply(lambda x: [x]) works perfectly.

answered Oct 06 '18 at 05:00

sawyermclane

896
11
28

Pandas convert Series of strings to Series of lists of strings (of size 1) for encoding

1 Answers1

Linked