graphlab adding variable columns from existing sframe

Question

I have a SFrame e.g.

a | b
-----
2 | 31 4 5
0 | 1 9
1 | 2 84

now i want to get following result

a | b      | c | d | e
----------------------
2 | 31 4 5 | 31|4  | 5
0 | 1 9    | 1 | 9 | 0
1 | 2 84   | 2 | 84 | 0

any idea how to do it? or maybe i have to use some other tools?

thanks

Nehal J Wani · Accepted Answer · 2016-08-28T10:00:41.940

Using pandas:

In [409]: sf
Out[409]: 
Columns:
    a   int
    b   str

Rows: 3

Data:
+---+--------+
| a |   b    |
+---+--------+
| 2 | 31 4 5 |
| 0 |  1 9   |
| 1 |  2 84  |
+---+--------+
[3 rows x 2 columns]

In [410]: df = sf.to_dataframe()

In [411]: newdf =  pd.DataFrame(df.b.str.split().tolist(), columns = ['c', 'd', 'e']).fillna('0')

In [412]: df.join(newdf)
Out[412]: 
   a       b   c   d  e
0  2  31 4 5  31   4  5
1  0     1 9   1   9  0
2  1    2 84   2  84  0

Converting back to SFrame:

In [498]: SFrame(df.join(newdf))
Out[498]: 
Columns:
    a   int
    b   str
    c   str
    d   str
    e   str

Rows: 3

Data:
+---+--------+----+----+---+
| a |   b    | c  | d  | e |
+---+--------+----+----+---+
| 2 | 31 4 5 | 31 | 4  | 5 |
| 0 |  1 9   | 1  | 9  | 0 |
| 1 |  2 84  | 2  | 84 | 0 |
+---+--------+----+----+---+
[3 rows x 5 columns]

If you want integers/floats, you can also do:

In [506]: newdf =  pd.DataFrame(map(lambda x: [int(y) for y in x], df.b.str.split().tolist()), columns = ['c', 'd', 'e'])

In [507]: newdf
Out[507]: 
    c   d    e
0  31   4  5.0
1   1   9  NaN
2   2  84  NaN

In [508]: SFrame(df.join(newdf))
Out[508]: 
Columns:
    a   int
    b   str
    c   int
    d   int
    e   float

Rows: 3

Data:
+---+--------+----+----+-----+
| a |   b    | c  | d  |  e  |
+---+--------+----+----+-----+
| 2 | 31 4 5 | 31 | 4  | 5.0 |
| 0 |  1 9   | 1  | 9  | nan |
| 1 |  2 84  | 2  | 84 | nan |
+---+--------+----+----+-----+
[3 rows x 5 columns]

thanks, this works, but only when converting back to SFrame by graphlab.SFrame(df) i got this error " TypeError: A common type cannot be infered from types integer, string." any idea ? — ikel, Aug 28 '16 at 09:42
in fact if i do df.describe(), ==> TypeError: unhashable type: 'dict' — ikel, Aug 28 '16 at 09:52
@ikel I replaced `fillna(0)` with `fillna('0')` and now the conversion works! — Nehal J Wani, Aug 28 '16 at 09:56

score 1 · Answer 2 · answered Aug 28 '16 at 09:26

def customsplit(string,column):
    val = string.split(' ')
    diff = column - len(val)
    val += ['0']*diff
    return val 

a  =  sf['b'].apply(lambda x: customsplit(x,3))
sf['c'] = [i[0] for i in a]
sf['d'] = [i[1] for i in a]
sf['e'] = [i[2] for i in a]

sf

Output:

a | b      | c | d | e
----------------------
2 | 31 4 5 | 31|4  | 5
0 | 1 9    | 1 | 9 | 0
1 | 2 84   | 2 | 84 | 0

score 0 · Answer 3 · answered Aug 29 '16 at 00:21

This can be done by SFrame itself not using Pandas. Just utilize 'unpack' function.

Pandas provides a variety of functions to handle dataset, but it is inconvenient to convert SFrame to Pandas DataFrame and vice versa.

If you handles over 10 Giga bytes data, Pandas can not properly handle the dataset. (But SFrame can)

# your SFrame
sf=sframe.SFrame({'a' : [2,0,1], 'b' : [[31,4,5],[1,9,],[2,84,]]})

# just use 'unpack()' function
sf2= sf.unpack('b')

# change the column names
sf2.rename({'b.0':'c', 'b.1':'d', 'b.2':'e'})

# filling-up the missing values to zero
sf2 = sf2['e'].fillna(0)

# merge the original SFrame and new SFrame
sf.join(sf2, 'a')

graphlab adding variable columns from existing sframe

3 Answers3