1

I am trying to add columns to a pretty large csv file (around 300MB), and I have successfully used the same script to add columns to smaller files that are in the same format. The script I am running:

import pandas as pd

df = pd.read_csv("20211003_skyrim_comment_data.csv")

words = open("conflict_words.txt")
words = words.readlines()
list_of_words = []

main_dict = {}

for word in words:
    list_of_words.append(word.strip())
    main_dict[word.strip()] = 0

dictionary_per_row = main_dict

list_of_dicts = []

for row in df.itertuples():
    dictionary_per_row = main_dict.copy()
    for word in list_of_words:
        if word in row[3]:
            dictionary_per_row[word] += 1
    list_of_dicts.append(dictionary_per_row)

df_to_append = pd.DataFrame(list_of_dicts)

new_df = df.join(df_to_append)

new_df.to_csv("/home/brianebrahimi/Desktop/scripts/20211027_skyrim_comment_data_with_conflict_scores.csv")

just basically takes words from a file and puts them into a list, then checks a column of the csv file to see if each row contains those words. The script runs fine, this is the output:

[Finished in 58.0s with exit code -9]
[cmd: ['/usr/bin/python3', '-u', '/home/brianebrahimi/Desktop/scripts/conflict_scores.py']]
[dir: /home/brianebrahimi/Desktop/scripts]
[path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin]

But the file is never saved. What is exit code -9? Is the file too large?

BrianEbrahimi
  • 75
  • 1
  • 8
  • non-zero exit code usually indicates a problem. `-9` is an [indicator of OOM](https://stackoverflow.com/questions/18529452/) – Marat Nov 17 '21 at 21:50
  • @Marat would getting more RAM solve this issue? – BrianEbrahimi Nov 17 '21 at 21:53
  • 1
    Yes, but there are also a lot of potential optimizations to this code – Marat Nov 17 '21 at 21:53
  • You never close the file. I am not sure how big it is though. Try to use `with open` instead. ALso, make sure that the output dir exists. – Snake_py Nov 17 '21 at 22:08
  • Try adding `chunksize=10000` to `read_csv` and `to_csv` and play with the numbers. Also, https://towardsdatascience.com/why-and-how-to-use-pandas-with-large-data-9594dda2ea4c – Emma Nov 17 '21 at 22:08

0 Answers0