1

I have data that looks like this. In each column, there are value/keys of varying different lengths. Some rows are also NaN.

    like                                match
0   [{'timestamp', 'type'}]              [{'timestamp', 'type'}]
1   [{'timestamp', 'comment', 'type'}]   [{'timestamp', 'type'}]
2   NaN                                 NaN

I want to split these lists into their own columns. I want to keep all the data (and make it NaN if it is missing). I've tried following this tutorial and doing this:

df1 = pd.DataFrame(df['like'].values.tolist())
df1.columns = 'like_'+ df1.columns

df2 = pd.DataFrame(df['match'].values.tolist())
df2.columns = 'match_'+ df2.columns

col = df.columns.difference(['like','match'])
df = pd.concat([df[col], df1, df2],axis=1)

I get this error.

Traceback (most recent call last):
  File "link to my file", line 12, in <module>
    df1 = pd.DataFrame(df['like'].values.tolist())
  File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 509, in __init__
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 524, in to_arrays
    return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 561, in _list_to_arrays
    content = list(lib.to_object_array(data).T)
  File "pandas/_libs/lib.pyx", line 2448, in pandas._libs.lib.to_object_array
TypeError: object of type 'float' has no len()

Can someone help me understand what I'm doing wrong?

842x604
  • 27
  • 1
  • 4

1 Answers1

0

You can't perform values.tolist() on NaN. If you delete that row of NaNs, you can get past this issue. but then your prefix line fails. See this for prefixes. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.add_prefix.html

Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14