0

I have a SFrame e.g.

a | b
-----
2 | 31 4 5
0 | 1 9
1 | 2 84

now i want to get following result

a | b      | c | d | e
----------------------
2 | 31 4 5 | 31|4  | 5
0 | 1 9    | 1 | 9 | 0
1 | 2 84   | 2 | 84 | 0

any idea how to do it? or maybe i have to use some other tools?

thanks

ikel
  • 1,790
  • 6
  • 31
  • 61

3 Answers3

1

Using pandas:

In [409]: sf
Out[409]: 
Columns:
    a   int
    b   str

Rows: 3

Data:
+---+--------+
| a |   b    |
+---+--------+
| 2 | 31 4 5 |
| 0 |  1 9   |
| 1 |  2 84  |
+---+--------+
[3 rows x 2 columns]

In [410]: df = sf.to_dataframe()

In [411]: newdf =  pd.DataFrame(df.b.str.split().tolist(), columns = ['c', 'd', 'e']).fillna('0')

In [412]: df.join(newdf)
Out[412]: 
   a       b   c   d  e
0  2  31 4 5  31   4  5
1  0     1 9   1   9  0
2  1    2 84   2  84  0

Converting back to SFrame:

In [498]: SFrame(df.join(newdf))
Out[498]: 
Columns:
    a   int
    b   str
    c   str
    d   str
    e   str

Rows: 3

Data:
+---+--------+----+----+---+
| a |   b    | c  | d  | e |
+---+--------+----+----+---+
| 2 | 31 4 5 | 31 | 4  | 5 |
| 0 |  1 9   | 1  | 9  | 0 |
| 1 |  2 84  | 2  | 84 | 0 |
+---+--------+----+----+---+
[3 rows x 5 columns]

If you want integers/floats, you can also do:

In [506]: newdf =  pd.DataFrame(map(lambda x: [int(y) for y in x], df.b.str.split().tolist()), columns = ['c', 'd', 'e'])

In [507]: newdf
Out[507]: 
    c   d    e
0  31   4  5.0
1   1   9  NaN
2   2  84  NaN

In [508]: SFrame(df.join(newdf))
Out[508]: 
Columns:
    a   int
    b   str
    c   int
    d   int
    e   float

Rows: 3

Data:
+---+--------+----+----+-----+
| a |   b    | c  | d  |  e  |
+---+--------+----+----+-----+
| 2 | 31 4 5 | 31 | 4  | 5.0 |
| 0 |  1 9   | 1  | 9  | nan |
| 1 |  2 84  | 2  | 84 | nan |
+---+--------+----+----+-----+
[3 rows x 5 columns]
Nehal J Wani
  • 16,071
  • 3
  • 64
  • 89
  • thanks, this works, but only when converting back to SFrame by graphlab.SFrame(df) i got this error " TypeError: A common type cannot be infered from types integer, string." any idea ? – ikel Aug 28 '16 at 09:42
  • in fact if i do df.describe(), ==> TypeError: unhashable type: 'dict' – ikel Aug 28 '16 at 09:52
  • @ikel I replaced `fillna(0)` with `fillna('0')` and now the conversion works! – Nehal J Wani Aug 28 '16 at 09:56
1
def customsplit(string,column):
    val = string.split(' ')
    diff = column - len(val)
    val += ['0']*diff
    return val 

a  =  sf['b'].apply(lambda x: customsplit(x,3))
sf['c'] = [i[0] for i in a]
sf['d'] = [i[1] for i in a]
sf['e'] = [i[2] for i in a]

sf

Output:

a | b      | c | d | e
----------------------
2 | 31 4 5 | 31|4  | 5
0 | 1 9    | 1 | 9 | 0
1 | 2 84   | 2 | 84 | 0
Ahsanul Haque
  • 10,676
  • 4
  • 41
  • 57
0

This can be done by SFrame itself not using Pandas. Just utilize 'unpack' function.

Pandas provides a variety of functions to handle dataset, but it is inconvenient to convert SFrame to Pandas DataFrame and vice versa.

If you handles over 10 Giga bytes data, Pandas can not properly handle the dataset. (But SFrame can)

# your SFrame
sf=sframe.SFrame({'a' : [2,0,1], 'b' : [[31,4,5],[1,9,],[2,84,]]})

# just use 'unpack()' function
sf2= sf.unpack('b')

# change the column names
sf2.rename({'b.0':'c', 'b.1':'d', 'b.2':'e'})

# filling-up the missing values to zero
sf2 = sf2['e'].fillna(0)

# merge the original SFrame and new SFrame
sf.join(sf2, 'a')
pocari
  • 1
  • 1