Default pip installation of Dask gives "ImportError: No module named toolz"

Question

I installed Dask using pip like this:

pip install dask

and when I try to do import dask.dataframe as dd I get the following error message:

>>> import dask.dataframe as dd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/venv/lib/python2.7/site-packages/dask/__init__.py", line 5, in <module>
    from .async import get_sync as get
  File "/path/to/venv/lib/python2.7/site-packages/dask/async.py", line 120, in <module>
    from toolz import identity
ImportError: No module named toolz
No module named toolz

I noticed that the documentation states

pip install dask: Install only dask, which depends only on the standard library. This is appropriate if you only want the task schedulers.

so I'm confused as to why this didn't work.

TheDudeAbides · Accepted Answer · 2023-04-20T04:35:25.040

In order to use Dask's parallelized dataframes (built on top of pandas), you have to tell pip to install some "extras" (reference), as mentioned in the Dask installation documentation:

pip install "dask[dataframe]"

Or you could just do

pip install "dask[complete]"

to get the whole bag of tricks. NB: The double-quotes may or may not be required in your shell.

The justification for this is (or was) mentioned in the Dask documentation:

We do this so that users of the lightweight core dask scheduler aren’t required to download the more exotic dependencies of the collections (numpy, pandas, etc.)

As mentioned in Obinna's answer, you may wish to do this inside a virtualenv, or use pip install --user to put the libraries in your home directory, if, say, you don't have admin privileges on to the host OS.

Extra details

At Dask 0.13.0 and below, there was a requirement on toolz' identity function within dask/async.py. There is ~~an open~~ a closed pull request associated with GitHub issue #1849 to remove this dependency. ~~In the meantime~~ If, for some reason, you are stuck with an older version of dask, you can work around that particular issue by simply doing pip install toolz.

But this wouldn't (completely) fix your problem with import dask.dataframe as dd anyway. Because you'd still get this error:

>>> import dask.dataframe as dd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/venv/lib/python2.7/site-packages/dask/dataframe/__init__.py", line 3, in <module>
    from .core import (DataFrame, Series, Index, _Frame, map_partitions,
  File "/path/to/venv/lib/python2.7/site-packages/dask/dataframe/core.py", line 12, in <module>
    import pandas as pd
ImportError: No module named pandas

or if you had pandas installed already, you'd get ImportError: No module named cloudpickle. So, pip install "dask[dataframe]" seems to be the way to go if you're in this situation.

I think it should be pip install "dask[complete]". Without the double quotes, it could throw an error >> no matches found. — Obinna Nnenanya, Jan 23 '19 at 16:04
@ObinnaNnenanya It might depend on your shell (works for me, no double quotes). But double quotes won't hurt so I updated the answer anyway. `:)` — TheDudeAbides, Jan 23 '19 at 16:34
After running `python -m pip install "dask[complete]"` as stated in [the documentation](https://docs.dask.org/en/latest/install.html#pip) I still get the error. — user171780, Jul 10 '22 at 12:47
@user171780 I recommend trying in a clean virtualenv (`python -m venv venv; source venv/bin/activate`, or as appropriate for your platform) before giving up entirely. — TheDudeAbides, Jul 11 '22 at 14:16

score 1 · Answer 2 · answered Jan 23 '19 at 18:37

I had this same issue and this was what fixed it for me.

Create a virtual env for your project
Cd your project directory (not required if you're good with directory navigation)
Activate you virtual env
pip install "dask[complete]" : This will install everything. You may wish to install only a given component like dataframe, then use pip install "dask[dataframe]"

The bottomline was that I had to be in my virtual environment; this would install dask for this env only.

score 1 · Answer 3 · answered May 20 '20 at 21:19

1

In my case, using anaconda on windows machine, here are the steps that solved this issue:

conda install dask
conda install dask-core
Install this based on a github comment !pip install tornado==5.0.0 distributed==2.15 dask-ml[complete]
restart my anaconda.

answered May 20 '20 at 21:19

HassanSh__3571619

1,859
1
19
18

Hi Hassan, thanks for your input. The question is tagged `pip`, though. Perhaps if a similar question for Anaconda does not exist yet, you could create your own version of this question (and [answer it yourself](https://stackoverflow.com/help/self-answer))? – TheDudeAbides Jul 10 '20 at 06:21

score 1 · Answer 4 · edited Jan 27 '21 at 14:53

1

Use the below command.

pip install "dask[dataframe]"

edited Jan 27 '21 at 14:53

4b0

21,981
30
95
142

answered Jan 27 '21 at 14:50

Khossen

31
1

1

Welcome to StackOverflow. While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply. Have a look here → [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer) – Federico Baù Jan 27 '21 at 14:57

Victor Villacorta · Answer 5 · 2019-04-11T22:38:04.377

0

requeriments.txt working:

awscli==1.16.69
botocore=1.13.0
boto3==1.9.79
numpy==1.16.2
dask[complete]

edited Apr 11 '19 at 22:38

answered Apr 11 '19 at 21:45

Victor Villacorta

577
5
6

1

If you have `dask[complete]` you neither need a separate specific numpy nor a specific pandas version for dask to work. Can you elaborate why one should use these specific versions? – NOhs Apr 11 '19 at 22:26

score 0 · Answer 6 · edited Aug 24 '22 at 08:39

You can check the ss below:

The error i got

Dask version

You can solve this situation like this:

Find daskpack from pypi https://pypi.org/project/dask/2022.5.0/
Check the required python version for the dask version selected from the requirements.
Find the dask package suitable for your python version from the release history page of the pypi dask page.
Go back to colab and remove dask completely !pip uninstall dask
Install the dask version you just found example : !pip install dask==2022.2.0
Your dask version should be the same as your distributed version. For this, find the distributed version corresponding to your dask version from pypi
Uninstall distributed from colab !pip uninstall distributed
Install the appropriate distributed version ex: !pip install distributed==2022.2.0
install auto-sklearn pip install auto-sklearn

note: auto-sklearn must be installed for each colab notebook and since colab will delete these changes after 12 hours, it must be installed from the beginning.

footnote: you can correct me for my mistakes and omissions. I hope the problem is solved for you as it was solved for me.

Default pip installation of Dask gives "ImportError: No module named toolz"

6 Answers6

Extra details

Linked