0

I tried installing Databricks' new koalas package using the recommended pip install koalas on but it failed on the pyarrow install.

I then installed pyarrow and retried koalas but it still failed on pyarrow. I visited the Github page which informed me:

If this fails to install the pyarrow dependency, you may want to try installing with Python 3.6.x, as pip install arrow does not work out of the box for 3.7 https://github.com/apache/arrow/issues/1125.

I searched through the discussions and could not make sense of the "solutions", perhaps because there aren't any. I am using Python 3.7.3. The error messages I get are:

  creating build/temp.macosx-10.7-x86_64-3.7
  -- Runnning cmake for pyarrow
  cmake -DPYTHON_EXECUTABLE=/anaconda3/bin/python  -DPYARROW_BOOST_USE_SHARED=on -DCMAKE_BUILD_TYPE=release /private/tmp/pip-install-uhdr9agf/pyarrow
  unable to execute 'cmake': No such file or directory
  error: command 'cmake' failed with exit status 1

  ----------------------------------------
  Failed building wheel for pyarrow
  Running setup.py clean for pyarrow
Failed to build pyarrow
Installing collected packages: pyarrow, koalas
  Found existing installation: pyarrow 0.13.0
    Uninstalling pyarrow-0.13.0:
      Successfully uninstalled pyarrow-0.13.0
  Running setup.py install for pyarrow ... error
    Complete output from command /anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/tmp/pip-install-uhdr9agf/pyarrow/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/tmp/pip-record-i7k4nwil/install-record.txt --single-version-externally-managed --compile:

...

    -- Runnning cmake for pyarrow
    cmake -DPYTHON_EXECUTABLE=/anaconda3/bin/python  -DPYARROW_BOOST_USE_SHARED=on -DCMAKE_BUILD_TYPE=release /private/tmp/pip-install-uhdr9agf/pyarrow
    unable to execute 'cmake': No such file or directory
    error: command 'cmake' failed with exit status 1

    ----------------------------------------
  Rolling back uninstall of pyarrow

... 

Command "/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/tmp/pip-install-uhdr9agf/pyarrow/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/tmp/pip-record-i7k4nwil/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/tmp/pip-install-uhdr9agf/pyarrow/

I have tried pip install koalas, sudo pip install koalas, and sudo -H pip install koalas and all have the same error message.

Has anyone found a solution to these errors? Or is koalas not (yet) compatible with 3.7?

Frank B.
  • 1,813
  • 5
  • 24
  • 44
  • Even I tried it with Python 3.7, it just doesn't work. It boils down to arrow dependency and won't install. Works fine for 3.6 though – Achilleus Apr 29 '19 at 19:23

1 Answers1

0

you probably saw this but the github post you mentioned regarding arrow says "It does work for Python<3.7. For Python 3.7, you need to have installed the Arrow C++ packages via different means."

I was able to get koalas working on a single machine spark local mode with python 3.6 and ran the github sample script successfully ... it also specifies "pyspark>=2.4.0 is recommended"

I am sure if you try 3.6 it will work for you.

import sys
print(sys.version)
import pandas as pd
import databricks.koalas as ks
import pyarrow as pa

3.6.8

pdf = pd.DataFrame({'x':range(3), 'y':['a','b','b'], 'z':['a','b','b']})
print(pdf.head())

   x  y  z
0  0  a  a
1  1  b  b
2  2  b  b

df = ks.from_pandas(pdf)
df.columns = ['x', 'y', 'z1']
df['x2'] = df.x * df.x
df['x2']
0    0
1    1
2    4
Name: x2, dtype: int64

thePurplePython
  • 2,621
  • 1
  • 13
  • 34
  • I did, and in the "does not work out of the box for 3.7..." link they suggest remedies but none worked for me. – Frank B. May 02 '19 at 16:16