2

I have a pandas.core.frame.DataFrame that looks like this:

         0 1
0  [1,2,3] 1
1  [2,2,1] 1
2  [1,2,1] 1
...

The last column is the label and each of the arrays under column '0' are supposed to be different datapoints for a given class.

I want this to be turned into:

   x0 x1 x2 label
0  1  2  3  1
1  2  2  1  1
2  1  2  1  1

I have tried the following with no luck

ds = ds.apply(lambda x: numpy.ravel(x))

That was result of the following, obviously that is not the right way to do this.

<list>.extend(zip(points,labels))
ds = pandas.core.frame.DataFrame(data=<list>)

Any help is appreciated, on how to fix the actual dataset or create it correctly having the two lists (points and labels).

Thanos
  • 2,472
  • 1
  • 16
  • 33

4 Answers4

3

Here's how I would do it. First remove your 1 column (so we dont mess the naming):

df['id'] = df[1]
df = df.drop(1, axis = 1)

Then create an objs, with what we want to concat, and concat:

objs = [df, pd.DataFrame(df[0].tolist())]
pd.concat(objs, axis=1)



           0    id  0   1   2
0   [1, 2, 3]   1   1   2   3
1   [2, 2, 1]   1   2   2   1
2   [1, 2, 1]   1   1   2   1
jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • This solution works the best! I have modified a little the second part to: `pd.concat([df, pd.DataFrame(list(df[0]))], axis=1)`, seems to perform slightly better than when using `.tolist()`. Thanks you! – Thanos Apr 12 '16 at 12:06
1

I assume your current column titles are text instead of integers.

df2 = pd.concat([pd.DataFrame(zip(*df['0'])), df['1']], axis=1)
df2.columns = ['x' + str(c) for c in df2.columns[:-1]] + ['label']

>>> df2
   x0  x1  x2  label
0   1   2   1      1
1   2   2   2      1
2   3   1   1      1

zip used with the * operator unzips the list:

>>> zip(*df['0'])
[(1, 2, 1), (2, 2, 2), (3, 1, 1)]

So you can easily expand your dataframe:

>>> pd.DataFrame(zip(*df['0']))
   0  1  2
0  1  2  1
1  2  2  2
2  3  1  1 

You then just need to concatenate the last column and rename all of the columns.

Alexander
  • 105,104
  • 32
  • 201
  • 196
  • The columns are integers. When I tried pd.DataFrame(zip(*ds[0])) I get the following error: frame.py 283 mgr = self._init_dict({}, index, columns, dtype=dtype) 284 elif isinstance(data, collections.Iterator): --> 285 raise TypeError("data argument can't be an iterator") 286 else: 287 try: TypeError: data argument can't be an iterator. Any ideas? – Thanos Apr 12 '16 at 10:36
0

You can create your dataframe differently to get what you want instead of trying to explode the column. See code below,

import pandas as pd
points = [[1,2,3],[2,2,1],[1,2,1]]
labels = [1,1,1]
x0 = [p[0] for p in points]
x1 = [p[1] for p in points]
x2 = [p[2] for p in points]
df = pd.DataFrame({'x0': x0,'x1': x1,'x2': x2, 'label': labels})
print (df)

To get,

   label  x0  x1  x2
0      1   1   2   3
1      1   2   2   1
2      1   1   2   1
nitin
  • 7,234
  • 11
  • 39
  • 53
-1

The best I can offer:

import numpy as np
# first convert your lists to an array, then iterate
tmp = np.array( df[0].tolist() )

for r in np.arange(0,3):
    df['x' + str(r)] = tmp[:,r]
tnknepp
  • 5,888
  • 6
  • 43
  • 57