2

We're trying to pull some data off of BigQuery using Pandas and running into this absolutely enormous trace about imports. As far as I can tell, all the right dependencies are installed.

Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 879, in _find_spec
AttributeError: 'PyxImporter' object has no attribute 'find_spec'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_forecasts.py", line 117, in <module>
    run_forecasts()
  File "run_forecasts.py", line 50, in run_forecasts
    a.run()
  File "./budgetforecastmodel/prepare/funnel_entrance.py", line 37, in run
    self._project_id
  File "./budgetforecastmodel/prepare/funnel_entrance.py", line 60, in _read_data
    funnel_entrances = fe_query.run()
  File "./budgetforecastmodel/extract/bigquery/biq_query_base.py", line 41, in run
    self.read_data()
  File "./budgetforecastmodel/extract/bigquery/biq_query_base.py", line 141, in read_data
    self.df = pd.read_gbq(self.query, self.project_id)
  File "./venv/lib/python3.6/site-packages/pandas/io/gbq.py", line 100, in read_gbq
    **kwargs)
  File "./venv/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 906, in read_gbq
    dialect=dialect, auth_local_webserver=auth_local_webserver)
  File "./venv/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 202, in __init__
    self.credentials = self.get_credentials()
  File "./venv/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 214, in get_credentials
    credentials = self.get_application_default_credentials()
  File "./venv/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 243, in get_application_default_credentials
    credentials, _ = google.auth.default(scopes=[self.scope])
  File "./venv/lib/python3.6/site-packages/google/auth/_default.py", line 281, in default
    credentials, project_id = checker()
  File "./venv/lib/python3.6/site-packages/google/auth/_default.py", line 158, in _get_gae_credentials
    from google.auth import app_engine
  File "./venv/lib/python3.6/site-packages/google/auth/app_engine.py", line 32, in <module>
    from google.appengine.api import app_identity
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 946, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 881, in _find_spec
  File "<frozen importlib._bootstrap>", line 855, in _find_spec_legacy
  File "./venv/lib/python3.6/site-packages/pyximport/pyximport.py", line 253, in find_module
    fp, pathname, (ext,mode,ty) = imp.find_module(fullname,package_path)
  File "~/.pyenv/versions/3.6.1/lib/python3.6/imp.py", line 270, in find_module
    "not {}".format(type(path)))
RuntimeError: 'path' must be None or a list, not <class '_frozen_importlib_external._NamespacePath'>

Our requirements file looks like the following:

cachetools==2.0.1
certifi==2017.7.27.1
chardet==3.0.4
et-xmlfile==1.0.1
fbprophet==0.2
google-api-python-client==1.6.4
google-auth==1.1.1
google-auth-httplib2==0.0.2
google-auth-oauthlib==0.1.1
httplib2==0.10.3
idna==2.6
jdcal==1.3
numpy==1.12.1
oauth2client==4.1.2
oauthlib==2.0.4
openpyxl==2.4.8
pandas==0.20.1
pandas-gbq==0.2.0
psycopg2==2.7.3.1
py==1.4.34
pyasn1==0.3.7
pyasn1-modules==0.1.5
pyodbc==4.0.17
pytest==3.2.2
python-dateutil==2.6.0
pytz==2017.2
PyYAML==3.12
requests==2.18.4
requests-oauthlib==0.8.0
rpy2==2.8.5
rsa==3.4.2
six==1.10.0
SQLAlchemy==1.1.9
sqlalchemy-redshift==0.6.0
uritemplate==3.0.0
urllib3==1.22
xlrd==1.1.0

This happens on a fresh clone with a fresh virtual environment, running Python 3.6.1 on OS X. Reproduced on two machines. Honestly at a bit of a loss where to even start. It worked before but something has regressed causing it to fail, but the way it dies suggests it isn't entirely because of us?

Literally any clues would be helpful! :)

Knifa
  • 436
  • 4
  • 14
  • From what I could see you are running this code in appengine right? Can you share the code? Is it Standard environment? – Willian Fuks Nov 16 '17 at 17:23
  • No, this is standalone and running locally. We're using simply `pandas.read_gbq(query, project_id)` which in turns uses the [BigQuery API library](https://developers.google.com/api-client-library/python/apis/bigquery/v). Nothing fancy on top of that. – Knifa Nov 16 '17 at 17:29
  • interesting...don't know why but it seems to be using `app_identity` from appengine to make the authentication (AFAIK this should happen only in appengine standard environment) – Willian Fuks Nov 16 '17 at 17:38
  • Turns out there's some conflict between Pandas/Google BQ/FB's Prophet. [I raised an issue](https://github.com/facebook/prophet/issues/363). Not sure if this qualifies as an answer? – Knifa Nov 17 '17 at 14:35
  • If it's confirmed there's a bug then I think it definitely does. Hopefully other people with the same issue can find this as well. – Willian Fuks Nov 17 '17 at 14:59

2 Answers2

1

This is a bug due to a conflict between PyStan (which Facebook's Prophet uses) and one of the Google authentication libraries (which the BigQuery API uses). Something to do with the use of pyximport.

There's an issue on Prophet here which you can track, but finding a fix looks like it'll be difficult.

Knifa
  • 436
  • 4
  • 14
0

I was able to resolve this by upgrading Cython from 0.25.2 to 0.29 as suggested here.