1

I'm trying to classify texts into categories. I've developed the code which does this, but kfold sample sizes differ on Spyder and Pycharm, even though the code is exactly the same.

This is the code:

def baseline_model():

    model = Sequential()
    embedding_size = 100

    model.add(Embedding(input_dim=num_words,
                        output_dim=embedding_size,
                        input_length=max_tokens,
                        name='embedding_layer'))

    model.add(LSTM(units=150, activation='relu', return_sequences=True))
    model.add(Dropout(0.3))
    model.add(LSTM(units=150, activation='relu', ))
    model.add(Dense(output_dim, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

estimator = KerasClassifier(build_fn=baseline_model, epochs=15, batch_size=128, verbose=1)
kfold = KFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X_train_pad, y_train, cv=kfold)
print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

The total data size is:

>>> X_train_pad.shape
Out[12]: (3320, 56)

This works fine on Spyder, where each fold uses 10 percent of the data as training, and the rest for testing:

Epoch 1/15
2988/2988 [==============================] - 19s 7ms/sample - loss: 3.6781 - acc: 0.0971

However, the same code uses only 24 samples on PyCharm:

Epoch 1/15
24/24[==============================] - 19s 7ms/sample - loss: 3.6781 - acc: 0.0971

I considered the libraries I installed, but it should not cause such problem. Any ideas why this happens?

Edit 1: Google colab use the same sample size with PyCharm:

Epoch 1/15
24/24[==============================] - 19s 7ms/sample - loss: 3.6781 - acc: 0.0971

Edit 2: If I create a new environment on Anaconda and install the latest packages, sample size is small. If I create the environment and install following packages, sample size is big.

> absl-py==0.7.1 alabaster==0.7.12 anaconda-client==1.7.2
> anaconda-navigator==1.9.7 anaconda-project==0.8.2 asn1crypto==0.24.0
> astor==0.8.0 astroid==2.1.0 astropy==3.1 atomicwrites==1.2.1
> attrs==18.2.0 Babel==2.6.0 backcall==0.1.0
> backports.functools-lru-cache==1.5 backports.os==0.1.1
> backports.shutil-get-terminal-size==1.0.0 backports.tempfile==1.0
> backports.weakref==1.0.post1 beautifulsoup4==4.6.3 bitarray==0.8.3
> bkcharts==0.2 blaze==0.11.3 bleach==3.0.2 bokeh==1.0.2 boto==2.49.0
> Bottleneck==1.2.1 certifi==2018.11.29 cffi==1.11.5 chardet==3.0.4
> chart-studio==1.0.0 Click==7.0 cloudpickle==0.6.1 clyent==1.2.2
> colorama==0.4.1 comtypes==1.1.7 conda==4.7.12 conda-build==3.18.9
> conda-package-handling==1.3.11 conda-verify==3.4.2 contextlib2==0.5.5
> cryptography==2.4.2 cycler==0.10.0 Cython==0.29.2 cytoolz==0.9.0.1
> dask==1.0.0 datashape==0.5.4 decorator==4.3.0 defusedxml==0.5.0
> distributed==1.25.1 docutils==0.14 entrypoints==0.2.3
> et-xmlfile==1.0.1 fastcache==1.0.2 filelock==3.0.10 Flask==1.0.2
> Flask-Cors==3.0.7 fsspec==0.4.0 future==0.17.1 gast==0.2.2
> gevent==1.3.7 glob2==0.6 graphviz==0.10.1 greenlet==0.4.15
> grpcio==1.16.1 h5py==2.8.0 heapdict==1.0.0 html5lib==1.0.1 idna==2.8
> imageio==2.4.1 imagesize==1.1.0 imbalanced-learn==0.4.3 imblearn==0.0
> importlib-metadata==0.6 inflection==0.3.1 ipykernel==5.1.0
> ipython==7.2.0 ipython-genutils==0.2.0 ipywidgets==7.4.2 isort==4.3.4
> itsdangerous==1.1.0 jdcal==1.4 jedi==0.13.2 Jinja2==2.10
> joblib==0.13.2 json5==0.8.5 jsonpickle==1.2 jsonschema==2.6.0
> jupyter==1.0.0 jupyter-client==5.2.4 jupyter-console==6.0.0
> jupyter-core==4.4.0 jupyterlab==0.35.3 jupyterlab-server==0.2.0
> Keras==2.2.4 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0
> keyring==17.0.0 kiwisolver==1.0.1 lazy-object-proxy==1.3.1
> libarchive-c==2.8 llvmlite==0.26.0 locket==0.2.0 lxml==4.2.5
> Markdown==3.1.1 MarkupSafe==1.1.0 matplotlib==3.1.1 mccabe==0.6.1
> menuinst==1.4.14 mistune==0.8.4 mkl-fft==1.0.6 mkl-random==1.0.2
> mock==3.0.5 more-itertools==4.3.0 mpl-finance==0.10.0 mpld3==0.3
> mpmath==1.1.0 msgpack==0.5.6 multipledispatch==0.6.0
> navigator-updater==0.2.1 nbconvert==5.4.0 nbformat==4.4.0
> networkx==2.2 nltk==3.4 nose==1.3.7 notebook==5.7.4 numba==0.41.0
> numexpr==2.6.8 numpy==1.15.4 numpydoc==0.8.0 odo==0.5.1 olefile==0.46
> openpyxl==2.5.12 packaging==18.0 pandas==0.25.1 pandocfilters==1.4.2
> parso==0.3.1 partd==0.3.9 path.py==11.5.0 pathlib2==2.3.3 patsy==0.5.1
> pep8==1.7.1 pickleshare==0.7.5 Pillow==5.3.0 pkginfo==1.4.2
> plotly==4.0.0 pluggy==0.8.0 ply==3.11 prometheus-client==0.5.0
> prompt-toolkit==2.0.7 protobuf==3.8.0 psutil==5.4.8 py==1.7.0
> pycodestyle==2.4.0 pycosat==0.6.3 pycparser==2.19 pycrypto==2.6.1
> pycurl==7.43.0.2 pydotplus==2.0.2 pyflakes==2.0.0 Pygments==2.3.1
> pylint==2.2.2 Pympler==0.7 pyodbc==4.0.25 pyOpenSSL==18.0.0
> pyparsing==2.3.0 pyreadline==2.1 pyrsistent==0.14.11 PySocks==1.6.8
> pytest==4.0.2 pytest-arraydiff==0.3 pytest-astropy==0.5.0
> pytest-doctestplus==0.2.0 pytest-openfiles==0.3.1
> pytest-remotedata==0.3.1 python-dateutil==2.7.5 pytz==2018.7
> PyWavelets==1.0.1 pywin32==223 pywinpty==0.5.5 PyYAML==3.13
> pyzmq==17.1.2 QtAwesome==0.5.3 qtconsole==4.4.3 QtPy==1.5.2
> Quandl==3.4.5 requests==2.21.0 retrying==1.3.3 rope==0.11.0
> ruamel-yaml==0.15.46 scikit-image==0.14.1 scikit-learn==0.20.1
> scipy==1.1.0 seaborn==0.9.0 Send2Trash==1.5.0 simplegeneric==0.8.1
> singledispatch==3.4.0.3 six==1.12.0 snowballstemmer==1.2.1
> sortedcollections==1.0.1 sortedcontainers==2.1.0 soupsieve==1.9.2
> Sphinx==1.8.2 sphinxcontrib-applehelp==1.0.1
> sphinxcontrib-devhelp==1.0.1 sphinxcontrib-htmlhelp==1.0.2
> sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.2
> sphinxcontrib-serializinghtml==1.1.3 sphinxcontrib-websupport==1.1.0
> spyder==3.3.2 spyder-kernels==0.3.0 SQLAlchemy==1.2.15
> statsmodels==0.9.0 SwarmPackagePy==1.0.0a5 sympy==1.3 TA-Lib==0.4.17
> tables==3.4.4 tblib==1.3.2 tensorboard==1.13.1 tensorflow==1.13.1
> tensorflow-estimator==1.13.0 termcolor==1.1.0 terminado==0.8.1
> testpath==0.4.2 toolz==0.9.0 tornado==5.1.1 tqdm==4.28.1
> traitlets==4.3.2 unicodecsv==0.14.1 urllib3==1.24.1 wcwidth==0.1.7
> webencodings==0.5.1 Werkzeug==0.14.1 widgetsnbextension==3.4.2
> win-inet-pton==1.0.1 win-unicode-console==0.5 wincertstore==0.2
> wrapt==1.10.11 xlrd==1.2.0 XlsxWriter==1.1.2 xlwings==0.15.1
> xlwt==1.3.0 zict==0.1.3 zipp==0.5.2
iso_9001_
  • 2,655
  • 6
  • 31
  • 47

1 Answers1

0

Ok, I found the problem. As I mentioned in my post, the so-called problem arises based on the versions of the libraries. And it seems, Keras now displays the batch number instead of sample count. Here are the similar posts:

Stackoverflow Post 1

Stackoverflow Post 2

iso_9001_
  • 2,655
  • 6
  • 31
  • 47