4

I am really new to dask. I want to create a dask dataframe from a python list of tuples. In pandas, you can use DataFrame.from_records to convert a list of tuples to a dataframe. What function can give me same functionality in dask. My data looks a bit like this

[(21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', ''), (21262, 'booking', 'NULL')]

I am using this code to perform the task right now. Is this correct way of doing this.

import pandas as pd
import dask
import dask.dataframe as dd

names = ['id', 'status', 'reg_entry']
dfs = dask.delayed(pd.DataFrame.from_records)(cursor.fetchall(), columns=names)

df = dd.from_delayed(dfs)
Ali. K
  • 147
  • 1
  • 3
  • 8
  • 2
    Welcome to SO. Please read [How to ask a good question](https://stackoverflow.com/help/how-to-ask). Can you provide code samples what you did already? – Florian Oct 16 '18 at 07:38
  • @Florian sorry for not being clearer the first time. I am new to this forum and in learning phase. Thanks for correcting me. – Ali. K Oct 16 '18 at 07:45

1 Answers1

3

You can try creating a dask dataframe from an existing pandas dataframe (to be able to use all pandas constructors):

df = pd.DataFrame([(21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', ''), (21262, 'booking', 'NULL')])
ddf = dd.from_pandas(df, npartitions=2)
Tina Iris
  • 551
  • 3
  • 8