0

I am trying to run the following code on my command line. I need to add an argument parser for my *argv argument in my Splitter function. When I try to run the code as follows I get an error: TypeError: splitter() got an unexpected keyword argument '*argv'.

I am wondering if there is a more proper way to add thus type of argument? The purpose of the argument is to allow people using the function to add from 0-inf arguments for *argv and I know it is working the way I want it to. I just don't know how to parse the argument.

import pandas as pd
import numpy as np
import argparse


def arg_parse():
    parser = argparse.ArgumentParser()
    parser.add_argument("-f", "--input_file", required = True)
    parser.add_argument("-s", "--sep", required=True,)
    parser.add_argument("-t", "--target_col", required=True)
    parser.add_argument("-n", "--new_col", required = False, default = None)
    parser.add_argument("-a", "--*argv", required = False, default = None)
    args=parser.parse_args()
    return vars(args)

def splitter(input_file, target_col, sep, new_col = None, *argv):
    df = pd.read_csv(input_file)
    df[target_col] = df[target_col].str.split(sep)
    exploded = df.explode(target_col)
    exploded[target_col].replace(r'^\s*$', np.nan, regex=True, inplace = True)
    exploded.dropna(subset=[target_col], inplace=True)
    if new_col == None:
        return(pd.DataFrame(exploded[[target_col,*argv]]))
    else:
        exploded[new_col] = exploded[target_col]
        return(pd.DataFrame(exploded[[new_col,*argv]]))

if __name__ == '__main__':
    args = arg_parse()
    print(splitter(**args))
  • When debugging it's a good idea to print `args` (before and/or after `vars`). The error suggests that the Namespace attribute names, or keys are not what you think they are. I'm not sure how the '*argv' is rendered. – hpaulj May 27 '20 at 20:02
  • That `"--*argv"` argument, what ever its real name, is just an ordinary argument, with either a None default, a user provided string. The '*' in the name does not connect it in any with the '*args' in the function call. – hpaulj May 27 '20 at 20:05

1 Answers1

0

Don't think of the command-line arguments as being the function arguments, but rather values you will use as function arguments. Be explicit when actually calling splitter.

Also, don't use required=True; if an argument is required, it should be a positional argument.

def arg_parse():
    parser = argparse.ArgumentParser()
    parser.add_argument("input_file")
    parser.add_argument("target_col")
    parser.add_argument("--sep", default=",")
    parser.add_argument("-n", "--new_col")
    parser.add_argument("argv", nargs="*")
    return parser.parse_args()

and

if __name__ == '__main__':
    args = arg_parse()
    result = splitter(
        args.input_file,
        args.target_col,
        args.sep,
        args.new_col,
        *args.argv
    )
    print(result)

Then your command line looks something like

yourScript.py -n bar some_file.csv foo arg1 arg2
# First any optional arguments
# Then the required file name and target column
# Finally, any additional arguments for argv

Though you original said --sep was required, it should probably be , if you are really working with CSV files. Leave it optional, but with a default of , that can be overridden as necessary.

chepner
  • 497,756
  • 71
  • 530
  • 681