1

I am trying to tokenize strings (rather long strings from 10-K reports) for a dataset with 56252 observations. However, the kernel keeps crashing usually about a quarter of the way through the dataset.

I tried:

  1. Running the .py file without Jupyter and receive the error: zsh: killed
  2. Simply using line[j].split(' ') instead of word_tokenize(line[j]) (below)
  3. Reinstalling python and jupyter.

Nothing seems to have worked, therefore, any feedback would be appreciated.

from nltk.tokenize import word_tokenize

output = [[20120831,20120808,1199,1928175839, 'words section one report', 'words section onea report', 'words section seven report'],[20150621,20141231,1239,1124966666, 'more words fly kite big', 'different words compared section before', 'even more different words']]

item_1 = []
item_1a = []
item_7 = []

count = 0
for line in output:
    try:
        item_entry = []
        item_tokens_to_add = []
        for i in range(0, 4):
            item_entry.append(line[i])
        for j in range(4, 7):
            line_tokens = word_tokenize(line[j])
            item_tokens_to_add.append(line_tokens)
        item_1.append(item_entry + item_tokens_to_add[0])
        item_1a.append(item_entry + item_tokens_to_add[1])
        item_7.append(item_entry + item_tokens_to_add[2])
        count += 1
    except:
        pass
    print(str(count) + '/' + str(len(output)))

Here is some information from the Jupyter log:

error 12:27:24.797: Disposing session as kernel process died ExitCode: undefined, Reason: /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/traitlets/traitlets.py:2202: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use 'hmac-sha256' instead of '"hmac-sha256"' if you require traitlets >=5.
  warn(
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/traitlets/traitlets.py:2157: FutureWarning: Supporting extra quotes around Bytes is deprecated in traitlets 5.0. Use '0a146f2b-abdf-428e-b31f-d08fde1c7026' instead of 'b"0a146f2b-abdf-428e-b31f-d08fde1c7026"'.
  warn(

I have actually removed all punctuation in the text hence I don't know why I'm getting an error message describing "extra quotes". Also, the kernel crashes at different observations.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I don't think it's the code... runs fine on my Mac's Jupyter instance. I'm surprised a 'warning' is killing your kernel. Though a source online recommends trying `pip install pywin32==228` Just to be sure, are you keeping an eye on your RAM useage when you run it? How much RAM do you have on your machine? – K. W. Cooper Sep 05 '22 at 11:11
  • See also: https://github.com/microsoft/vscode-jupyter/issues/9347 – K. W. Cooper Sep 05 '22 at 11:16
  • @k-w-cooper Thanks for your reply. I have 8 GB of RAM and only have VSCode open and nothing else. I believe I shut down most background processes and there's about 3GB memory free. I tried `pip install pywin32==228` but that did not resolve it either. – princesskaguya666 Sep 05 '22 at 11:27
  • try `conda install ipykernel --update-deps --force-reinstall` – K. W. Cooper Sep 05 '22 at 11:29
  • ie. from this discussion: https://stackoverflow.com/questions/67036168/kernel-died-with-exit-code-1vs-code – K. W. Cooper Sep 05 '22 at 11:29
  • Thanks, with kernel crashes it never hurts to rule out ram. The conclusion from both discussions is that the traitlets bug has to do something with dependancies between ipython and VSCode. – K. W. Cooper Sep 05 '22 at 11:33
  • @k-w-cooper The traitlets bug seems to be the issue. After downgrading to version 4.3.3, it still crashes and led to the kernel not starting. I went through some steps to get the kernel to start and that ultimately comes down to upgrading traitlets back to 5.1.1. Similarly, I tried to reinstall pyzmq to version 19.2.0 but that doesn't work either. Also tried `!echo $DYLD_LIBRARY_PATH` in the terminal. Anyway, thanks for your input! – princesskaguya666 Sep 05 '22 at 16:29
  • 1
    fwiw, I got the same error but the reason for kernel crush was that the RAM was too small. Upgrading RAM size fixed it. – katsuya Oct 05 '22 at 21:03

0 Answers0