4

The same task in Pandas can be easily done with

import pandas as pd
df = pd.DataFrame({"lists":[[i, i+1] for i in range(10)]})
df[['left','right']] = pd.DataFrame([x for x in df.lists])

But I can't figure out how to do something similar with a dask.dataframe

Update

So far I found this workaround

ddf = dd.from_pandas(df, npartitions=2)
ddf["left"] = ddf.apply(lambda x: x["lists"][0], axis=1, meta=pd.Series())
ddf["right"] = ddf.apply(lambda x: x["lists"][1], axis=1, meta=pd.Series())

I'm wondering if there is another way to procede.

rpanai
  • 12,515
  • 2
  • 42
  • 64

1 Answers1

9

You could achieve this using assign:

ddf = ddf.assign(left=ddf.lists.map(lambda x: x[0]),
                 right=ddf.lists.map(lambda x: x[1]))

e.g.,

ddf.compute()


     lists  left  right
0   [0, 1]     0      1
1   [1, 2]     1      2
2   [2, 3]     2      3
3   [3, 4]     3      4
4   [4, 5]     4      5
5   [5, 6]     5      6
6   [6, 7]     6      7
7   [7, 8]     7      8
8   [8, 9]     8      9
9  [9, 10]     9     10

An alternative way of phrasing this (see comments, below) might be

ddf = ddf.assign(**{k: ddf.lists.map(lambda x, i=i: x[i]) 
                 for i, k in enumerate(['left', 'right'])})
thebeancounter
  • 4,261
  • 8
  • 61
  • 109
mdurant
  • 27,272
  • 5
  • 45
  • 74
  • Thanks. I'm wondering if is possible to use a loop for `assing` and/or `apply`. I mean in pandas I can do something like this `{value: df.lists.map(lambda x: x[key]) for key, value in enumerate(["left","right"])}` but this doesn't work with dask. – rpanai Jul 25 '17 at 14:02
  • Since `assign` takes optional keyword arguments, you could use `**kwargs` syntax to pass a dictionary comprehension. – mdurant Jul 25 '17 at 14:42
  • I did so as on my previous comment but I got an error. – rpanai Jul 25 '17 at 14:59
  • I think using `.map()` which was built for pure mapping with the help of dictionaries instead of `.apply()` which was created for just that, applying functions when simple mapping is not enough... Is... 'slightly wrong'. I understand that you are trying to avoid the hassle of `meta=...`, but it was put there for a reason, that's what the cluster needs... – Alex Fedotov Jun 13 '20 at 01:12