This would be a noted one . My data consists of few million records with fields like user agent , Ip addresses and so on consisting of 10 columns .Every time the unique strings are been mapped to integers before feeding into ML-Models for training and saved using pickle. The data is been passed incrementally and Dictionaries are been unpickled and used for the new data set mapping . As the dictionary gets bulky , Im facing issues with RAM usage only at last 2 fields mentioned above.Could you suggest any alternative for this condition and why is there a spike though large memory is available.
Memory size - 64Gb Input Dictionary is of the size 2GB input file size around 5GB with lenght 32432769