UPDATE: Error in command line: ValueError: Invalid file path or buffer object type:

Question

I now have my function working as follows: Even though it executes the proper solution I end up with a huge error in my CMD line https://i.stack.imgur.com/PBTwz.png (ValueError: Invalid file path or buffer object type: < class 'pandas.core.frame.DataFrame' >). Does anyone know why I am getting this error even though the output is executing correctly?

import pandas as pd
import numpy as np
import argparse

target_col = "CountryRefs"

sep = ","

input_file = 'Test_set'

def arg_parse():
    parser = argparse.ArgumentParser()
    parser.add_argument("-f", "--input_file", required = True)
    parser.add_argument("-s", "--sep", required=True,)
    parser.add_argument("-t", "--target_col", required=True)
    args=parser.parse_args()
    return vars(args)

def splitter(input_file, target_col, sep, new_col = None, *argv):
    df = pd.read_csv(input_file)
    df[target_col] = df[target_col].str.split(sep)
    exploded = df.explode(target_col)
    exploded[target_col].replace(r'^\s*$', np.nan, regex=True, inplace = True)
    exploded.dropna(subset=[target_col], inplace=True)
    if new_col == None:
        return(pd.DataFrame(exploded[[target_col,*argv]]))
    else:
        exploded[new_col] = exploded[target_col]
        return(pd.DataFrame(exploded[[new_col,*argv]]))

if __name__ == '__main__':
    args = arg_parse()
    print(splitter(**args))

Argparse error: error: the following arguments are required: -f/--df,-s/--sep, -t/--target_col — Sara Carlin, May 27 '20 at 16:42
you should specify those arguments when calling your function: `python -f something` — avloss, May 27 '20 at 16:45
Do you know how to specify commandline arguments? How are you running this script? From a OS shell window? Or some IDE like `sypder` or a window? — hpaulj, May 27 '20 at 16:53
There's little point specifying a default value if you are mark an option as required. — chepner, May 27 '20 at 16:57
I am running this from the command line. The goal is to bring this into tableau prep and create an extension — Sara Carlin, May 27 '20 at 17:04
Please [edit] your question and add the error message (no images)! — aschipfl, May 27 '20 at 20:44

avloss · Answer 1 · 2020-05-27T21:36:47.427

You want something like this

import pandas as pd
import numpy as np
import argparse


target_col = "CountryRefs"

sep = ","

data = {'CountryRefs':['Italy, Germany', 'Japan , France', '', 'Alaska'],
    'Authors':['Dom', 'Xavier', 'Kathleen', 'Joe'], 'Friends':['Amy Pete', 'Joe', None, 'Franklin'],
    'Colors':['red.blue', ' ', 'yellow', 'black.blue']}
df = pd.DataFrame(data, columns = ['CountryRefs', 'Authors', 'Friends', 'Colors'])

def arg_parse():
    parser = argparse.ArgumentParser()
    parser.add_argument('argv', type=str, nargs='*', default=[])
    parser.add_argument("-s", "--sep", dest="sep", required=True, default=',')
    parser.add_argument("-t", "--target_col", dest="target_col", required=True, default='1')

    args=parser.parse_args()
    return vars(args)

def splitter(df, target_col, sep, new_col = None, argv=[]):

    df[target_col] = df[target_col].str.split(sep)
    exploded = df.explode(target_col)
    exploded[target_col].replace(r'^\s*$', np.nan, regex=True, inplace = True)
    exploded.dropna(subset=[target_col], inplace=True)
    if new_col == None:
        return(pd.DataFrame(exploded[[target_col,*argv]]))
    else:
        exploded[new_col] = exploded[target_col]
        return(pd.DataFrame(exploded[[new_col,*argv]]))

if __name__ == '__main__':
    args = arg_parse()
    print(splitter(df, **args))

then you execute this code by calling this

python sep.py -t CountryRefs -s ','

or like this

python sep.py -t CountryRefs -s ',' Friends Colors

To make this simpler, you can get rid of the `dest` arguments (they're just specifying what is inferred from the first long option), and you can probably do away with the `required` arguments. (Either them, or the defaults, and it makes more sense to make the options truly optional.) — chepner, May 27 '20 at 17:08
you can't pass `df` as an argument like that - but you can pass file name as an argument, and then create a dataframe by reading from that file. — avloss, May 27 '20 at 17:44
How do I add a parser argument for the *argv argument in the function? — Sara Carlin, May 27 '20 at 19:05

UPDATE: Error in command line: ValueError: Invalid file path or buffer object type:

1 Answers1