8

I'm curious about what goes on behind the scenes when using argparse. I've checked here and here, as apparently Namespace presently only exists in the argparse library.

It's possible I'm using the wrong keywords to search SO/Google. It's also possible that I'm asking a senseless or obvious question, but here we go.

When capturing a string of input in Python via argparse as such:

>python palindrome.py 'Taco cat!?'

When running the code below, I would expect that by specifying parser.add_argument('string'... the resulting Namespace acts as a buffer for the single input string.

The next line where I assign the string to "args" must be the first time we actually parse the input, incurring a workload proportionate to the length of the input string. At this point, "args" actually contains a Namespace object, which cannot be parsed via for loop or otherwise (that I know of).

Finally, in order to parse the input using "for" or some other loop, I use the Namespace object to populate a string. I'm curious how many times this process incurs a compute time proportionate to the original string length?

Which of these steps copy by address or copy by value behind the scenes? Looks like the optimistic worse case would be 2x. Once to create the Namespace object, then once again to assign its content to "arg_str"

#! /usr/bin/env python
import sys
import argparse

parser = argparse.ArgumentParser(description='Enter string to see if it\'s a palindrome.')
parser.add_argument('string', help="string to be tested for palindromedness..ishness")
args = parser.parse_args()

arg_str = args.string

# I can parse by using 'for i in arg_str:' but if I try to parse 'for i in args:'
# I get TypeError: "Namespace' object is not iterable

Thanks for looking!!

Community
  • 1
  • 1
hitjim
  • 93
  • 6

2 Answers2

2

The operating system (or the shell) first parses the command line, passing the strings to the Python interpreter, where they are accessible to you as the sys.argv array.

python palindrome.py 'Taco cat!?'

becomes

['palindrome.py', 'Taco cat!?']

parser.parse_args() processes those strings, generally by just passing references around. When a basic argument is 'parsed', that string is 'stored' in the Namespace with setattr(Namespace, dest, value), which in your example would be the equivalent to setattr(namespace, 'string', sys.argv[1]).

There is nothing special about argparse.Namespace. It is a simple subclass of Object. The arguments are simple object attributes. argparse uses setattr and getattr to access these, though users can normally use the dot format (args.string). It does not do any special string handling. That is entirely Python's responsibility.

The Namespace is not an iterable, that is, it is not a list or tuple or anything like that. It is an object. The namespace can be converted to a dictionary with vars(args) (that's in the argparse documentation). So you could iterate over that dictionary using keys and items.

One other thing. Don't test your arg.string with is. Use == or in [] to compare it to other strings. That is because a string that is created via sys.argv does not have the same id as one created via x = 'test'. To get an idea why, try:

argv = 'one two three'.split()
print argv[0]=='one' # true
print argv[0] is 'one'  # false
print argv[0] in ['one', 'two','three'] # true
x = 'one'
print x is 'one' # true
print id(x)
print id('one')
print id(argv[0])

Where possible Python does keep unique copies of strings. but if the strings are generated in different ways, they will have different ids, and not satisfy the is test.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I was unaware of argparse using sys.argv, and that I could use sys.argv on its own. Thanks for bridging that gap, not to mention offering additional pitfalls! Of course, now that you've steered me straight, [here's the evidence](http://docs.python.org/dev/library/argparse.html#the-parse-args-method)! – hitjim Nov 20 '13 at 18:57
1

Python assignment never makes copies of things. Variables are names for objects; assigning an object to a variable gives the object a new name (taking the name from whatever had it before). In low-level terms, it amounts to a pointer copy and a few refcount operations.

No part of this code requires a copy to be made of the input string. If it did, though, it wouldn't matter. Command-line arguments can't (and shouldn't) get long enough for the time you might spend copying to be significant compared to the rest of your runtime.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • While the checked answer got bonus points for detail, this certainly got me on the right track to understanding. Very concise, and you specifically addressed the performance-paranoia-themed tone of my question. – hitjim Nov 20 '13 at 19:01