0

I'm attempting to achieve zero-copy sharing of a Pandas DataFrame between processes launched from seperate console sessions. Please consider the following two Python files:

producer.py:

import pandas as pd 
import numpy as np 
import pickle

df = pd.DataFrame({'text': ['a','b','c'], 'ints':[1,2,3], 'floats': [1.0,2,3]})


print(df)
#   text  ints  floats
# 0    a     1     1.0
# 1    b     2     2.0
# 2    c     3     3.0

# prints as expected!


buffers = []

with open("my_df.pickle", "wb") as f:
    pickle.dump(df, protocol=5, buffer_callback=buffers.append, file=f)


for b in buffers:
    print(len(b.raw()))
# 24
# 24

# only prints 2 buffers! Expected 3 buffers (1 for each column)

subsequently I run from another console consumer.py:

import pandas as pd 
import numpy as np 
import pickle


buffers = [pickle.PickleBuffer(bytes(24)), pickle.PickleBuffer(bytes(24))]

f = open("my_df.pickle", "rb")
df = pickle.load(f, buffers=buffers)


print(df)
#   text  ints  floats
# 0    a     0     0.0
# 1    b     0     0.0
# 2    c     0     0.0

# Unexpected output. Numerical values are zero'd. And only 1 out of 3 columns ('text') is correctly populated. 

It seems that the 2 PickleBuffers are for the numerical columns only, yet they are not brought across correctly, whilst the text column is!

(Obviously the intention is to bring across the full DataFrame correctly.)

Any advice most welcome!

FunkyOne
  • 51
  • 1
  • 5
  • `df.to_pickle('my_df.pkl')` produces a file larger by a few bytes... –  Mar 25 '22 at 18:02
  • Yes it does. Unfortunately it doesn't achieve zero-copy. My ultimate goal here concerns the sharing of very large DataFrames, multiple times over. – FunkyOne Mar 25 '22 at 18:39

0 Answers0