1

I am trying to create a dataframe using a function but I am forced to export it as a pickled object. Is there a more efficient way use it without the pickle?

data = {'string_to_split': [ 'cava;san felice;  cancello', 'niente;qualcosa;0' ]}
data = pd.DataFrame(data)

global final_df_name
def extractor(col_name_0, col_name_1 , df = data , sep =';', final_df_name='final_df_name'):
    counter = 0
    col_name_0 = df['string_to_split'].apply(lambda x : x.split(sep)[counter]) 
    counter =+1  
    col_name_1 = df['string_to_split'].apply(lambda x : x.split(sep)[counter]) 
    df['var_name_0'] = col_name_0  
    df['var_name_1'] = col_name_1  
    final_df_name = df
    final_df_name.to_pickle("final_df_name")

test =  pd.read_pickle("final_df_name")
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
manuzzo
  • 73
  • 3

2 Answers2

0

Simply returning the dataframe from the function may work for you

# earlier logic

def foo(df):
    operate_on_df(df)
    return df

df = pd.DataFrame(source_data)

df = foo(df)
# continue to work with df

Note that changing mutable input args in Python will change their later use, as they're passed by-reference. This means you can directly use it without return and re-assigning. However, modified references are often returned for clarity.

ti7
  • 16,375
  • 6
  • 40
  • 68
  • I do not get your suggestion on re-assign (I am quite new on python, in fact I am not exactly a programmer...) but thank you anyway, I have to say that putting the "return" does not work for me. – manuzzo Jul 02 '20 at 13:29
  • Not to worry and welcome to a terrific programming language! I can highly recommend [this handout](https://web.archive.org/web/20180411011411/http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html), the section on labels may be quite relevant here, but you'll find it all very practical. – ti7 Jul 02 '20 at 15:40
0

I have found the solution here: Returning a dataframe in python function

effectively the correct code was:

enter code here
data = {'string_to_split': [ 'cava;san felice;  cancello', 'niente;qualcosa;0' ]}
data = pd.DataFrame(data)

global final_df_name  
def extractor(col_name_0, col_name_1 , df = data , sep =';', 
    final_df_name='final_df_name'):
    counter = 0
    col_name_0 = df['string_to_split'].apply(lambda x : x.split(sep)[counter]) 
    counter =+1  
    col_name_1 = df['string_to_split'].apply(lambda x : x.split(sep)[counter]) 
    df['var_name_0'] = col_name_0  
    df['var_name_1'] = col_name_1  
    final_df_name = df
    return final_df_name

## by this last line I have the saved DataFrame: 
data_want = extractor( col_name_0 = 'col_name_0',  col_name_1 ='col_name_1')
manuzzo
  • 73
  • 3