3

I have a python script that returns a pandas dataframe and I want to run the script in a Jupyter notebook and then save the results to a variable.

The data are in a file called data.csv and a shortened version of the dataframe.py file whose results I want to access in my Jupyter notebook is:

# dataframe.py
import pandas as pd
import sys

def return_dataframe(file):
    df = pd.read_csv(file)
    return df

if __name__ == '__main__':
    return_dataframe(sys.argv[1])

I tried running:

data = !python dataframe.py data.csv

in my Jupyter notebook but data does not contain the dataframe that dataframe.py is supposed to return.

jtlz2
  • 7,700
  • 9
  • 64
  • 114
tshwizz
  • 133
  • 1
  • 6

1 Answers1

1

This is how I did it:

# dataframe.py 
import pandas as pd
import sys

def return_dataframe(f): # don't shadow built-in `file`
    df = pd.read_csv(f)
    return df

if __name__ == '__main__':
    return_dataframe(sys.argv[1]).to_csv(sys.stdout,index=False)

Then in the notebook you need to convert an 'IPython.utils.text.SList' into a DataFrame as shown in the comments to this question: Convert SList to Dataframe:

data = !python3 dataframe.py data.csv
df = pd.DataFrame(data=data)[0].str.split(',',expand=True)

If the DataFrame is already going to be put into CSV format then you could simply do this in the notebook:

df = pd.read_csv('data.csv')
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223