GENSIM Error in Canopy Express

Question

I am trying to run the GENSIM Topic modeling example in Canopy Express and get the following error on Sum() line.

from gensim import corpora, models, similarities
from itertools import chain

""" DEMO """
documents = ["Human machine interface for lab abc computer applications",
         "A survey of user opinion of computer system response time",
         "The EPS user interface management system",
         "System and human system engineering testing of EPS",
         "Relation of user perceived response time to error measurement",
         "The generation of random binary unordered trees",
         "The intersection graph of paths in trees",
         "Graph minors IV Widths of trees and well quasi ordering",
         "Graph minors A survey"]

# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
     for document in documents]

# remove words that appear only once
all_tokens = sum(texts, [])
tokens_once = set(word for word in set(all_tokens) if all_tokens.count(word) == 1)
texts = [[word for word in text if word not in tokens_once] for text in texts]

The error I get is TypeError: an integer is required. It seems to be ok in regular Python but Canopy has an issue. It seems it is how Canopy treats the sum statement but I'm not sure how to work around it. Any ideas as I'm just getting started with Python and text analysis.

Thank you. This helps a lot as I am a beginner and just trying to learn the package. I saw a similar question that pointed to the sum statement but did not elaborate on how to address it. Your answer provides that. Thank you again. — user3890455, Dec 15 '14 at 14:35

score 0 · Accepted Answer · answered Dec 15 '14 at 04:47

Canopy Python itself is "regular Python 2.7". However, the Python pane in the Canopy GUI is an IPython QTConsole, which adds a layer of functionality, mostly for better, but on rare occasion for worse. By default it starts in Pylab mode, which can be confusing to beginners (see https://support.enthought.com/entries/25750190-Modules-are-already-available-in-Canopy-s-Python-PyLab-prompt-but-not-in-a-script).

You don't describe what you are doing with any precision, but from the symptom that you describe, it sounds as if you are running your commands one-by-one at the IPython prompt, either by copy-paste or by selecting your commands in the text editor and doing "Run Selection". In the IPython prompt, because Pylab does an implicit from numpy import *, the sum function refers to numpy's sum, rather than the built-in Python sum, which would account for the error message that you report.

Three different solutions (out of many):

1) If you simply run your script (rather than "Run Selection" or copy/paste commands), it should act as expected. This is the most robust, flexible solution.

2) Disable Pylab mode in Canopy preferences; then you can run your commands either way.

3) (Not a great solution but instructive). Do del sum at the IPython prompt. This will delete the numpy sum from the IPython namespace, uncovering the original built-in sum and allowing your code to run either way.

GENSIM Error in Canopy Express

1 Answers1