16

I have installed gensim (through pip) in Python. After the installation is over I get the following warning:

C:\Python27\lib\site-packages\gensim\utils.py:855: UserWarning: detected Windows; aliasing chunkize to chunkize_serial warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")

How can I rectify this?

I am unable to import word2vec from gensim.models due to this warning.

I have the following configurations: Python 2.7, gensim-0.13.4.1, numpy-1.11.3, scipy-0.18.1, pattern-2.6.

user7420652
  • 183
  • 1
  • 1
  • 8

2 Answers2

34

You can suppress the message with this code before importing gensim:

import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')

import gensim
Roland Pihlakas
  • 4,246
  • 2
  • 43
  • 64
  • @user7420652 Hey, thanks for your reply and happy to know! Stack Overflow works like that: instead of commenting (unless you want to add more info), you can upvote answers that are helpful and if the problem is solved then choose one of the answers as "the solution" by clicking the check mark on left of that answer. – Roland Pihlakas Feb 16 '17 at 13:14
  • 2
    Anyone knows what's the point of the warning though? it currently pops up also upon the first time the gensim is imported in code after installation. – matanster Jun 12 '18 at 06:20
16

I think is not a big problem. Gensim just lets you know that it will alias chunkize to different function because you use a specific os.

Check out this code from gensim.utils

if os.name == 'nt':
    logger.info("detected Windows; aliasing chunkize to chunkize_serial")

    def chunkize(corpus, chunksize, maxsize=0, as_numpy=False):
        for chunk in chunkize_serial(corpus, chunksize, as_numpy=as_numpy):
            yield chunk
else:
    def chunkize(corpus, chunksize, maxsize=0, as_numpy=False):
    """
    Split a stream of values into smaller chunks.
    Each chunk is of length `chunksize`, except the last one which may be smaller.
    A once-only input stream (`corpus` from a generator) is ok, chunking is done
    efficiently via itertools.

    If `maxsize > 1`, don't wait idly in between successive chunk `yields`, but
    rather keep filling a short queue (of size at most `maxsize`) with forthcoming
    chunks in advance. This is realized by starting a separate process, and is
    meant to reduce I/O delays, which can be significant when `corpus` comes
    from a slow medium (like harddisk).

    If `maxsize==0`, don't fool around with parallelism and simply yield the chunksize
    via `chunkize_serial()` (no I/O optimizations).

    >>> for chunk in chunkize(range(10), 4): print(chunk)
    [0, 1, 2, 3]
    [4, 5, 6, 7]
    [8, 9]

    """
    assert chunksize > 0

    if maxsize > 0:
        q = multiprocessing.Queue(maxsize=maxsize)
        worker = InputQueue(q, corpus, chunksize, maxsize=maxsize, as_numpy=as_numpy)
        worker.daemon = True
        worker.start()
        while True:
            chunk = [q.get(block=True)]
            if chunk[0] is None:
                break
            yield chunk.pop()
    else:
        for chunk in chunkize_serial(corpus, chunksize, as_numpy=as_numpy):
            yield chunk
Ayush Jain
  • 33
  • 1
  • 7