0

This would be a noted one . My data consists of few million records with fields like user agent , Ip addresses and so on consisting of 10 columns .Every time the unique strings are been mapped to integers before feeding into ML-Models for training and saved using pickle. The data is been passed incrementally and Dictionaries are been unpickled and used for the new data set mapping . As the dictionary gets bulky , Im facing issues with RAM usage only at last 2 fields mentioned above.Could you suggest any alternative for this condition and why is there a spike though large memory is available.

Memory size - 64Gb Input Dictionary is of the size 2GB input file size around 5GB with lenght 32432769

  • 1
    change line 7 in your code to: show the code please :) and then in line 9: we will help you :) – Drako Jan 17 '20 at 11:48
  • with open(filename,'rb') as file: Dict=pickle.load(file) file.close() new_count = Dict_count[j] print("The count of the dict is: ",new_count) for i in range(len(X)): if X[i][j] not in Dict: Dict[X[i][j]] = new_count new_count += 1 X[i][j] = Dict[X[i][j]] exec("%s = %a" % (x_columns[j],Dict)) new_count_list.append(new_count) – Supritha Bhasker Jan 20 '20 at 05:49

0 Answers0