2

First of all I'd like to state that this is a debug question for an exercise, but I can't get any help from the lecturer, and as much as I've read up on arguments I can't seem to figure it out, so here I am. So I have a python script that compares .txt files passed as arguments. Currently it is called it as follows:

python compare.py -s stop_list.txt NEWS/news01.txt NEWS/news02.txt

and the files are parsed into a list of names using

import sys, re, getopt, glob

opts, args = getopt.getopt(sys.argv[1:],'hs:bI:')
opts = dict(opts)
filenames = args

if '-I' in opts:
    filenames = glob.glob(opts['-I'])

print('INPUT-FILES:', ' '.join(filenames))
print(filenames)

I can pass more than two files by simply listing them together

python compare.py -s stop_list.txt NEWS/news01.txt NEWS/news02.txt NEWS/news03.txt NEWS/news04.txt

but this can quickly become impractical.

Now it is suggested that more files can be passed using a pattern

python compare.py -s stop_list.txt -I ’NEWS/news??.txt’
i.e.:
python compare.py -s stop_list.txt -I ’NEWS/news0[123].txt’

However it seems to behave a bit weirdly. First of all if I write:

python compare.py -s stop_list.txt -I NEWS/news01.txt NEWS/news02.txt

only news01.txt will be passed to the script.

Following, when using the pattern as suggested there is no input whatsoever. I can't really understand if the code for parsing the input files is wrong and needs some altering, or I'm doing something wrong.

The -h states:

USE: python <PROGNAME> (options) file1...fileN
OPTIONS:
    -h : print this help message
    -b : use BINARY weights (default: count weighting)
    -s FILE : use stoplist file FILE
    -I PATT : identify input files using pattern PATT, 
              (otherwise uses files listed on command line)

Thanks in advance :)

George
  • 143
  • 7
  • 1
    Just checking: are those special quotes -> ’ or regular ones -> ' ? They look very similar but the fancy ones are included by Python in your string, if that's the case. Try this to test: https://superuser.com/a/581529/717739 – oblio Oct 19 '18 at 12:50
  • 1
    Your argument parsing seems to only use the first argument after `-I` as the pattern. You need to mark it as taking a variable number of arguments and probably iterate over the list of patterns you get. Or pass a pattern that includes all your files (which is the intent behind this). – Graipher Oct 19 '18 at 12:56
  • 1
    @oblio they are not > ' (simple). On the other hand it's just copy-paste from the pdf we were handed out so I'm not sure if it was intentional, or if it's just how they were printed to the pdf. In any case I still get no input if I change them with > ' . – George Oct 19 '18 at 12:58
  • How do you get the opts variable? Where is it coming from? – oblio Oct 19 '18 at 13:00
  • 1
    @oblio oh ffs. apparently it has to be called with double quotes " . Just figured it out by running at random using "news0[123].txt" . – George Oct 19 '18 at 13:03
  • In that case I'm going to post my comment as answer and you could accept it :) – oblio Oct 19 '18 at 13:04
  • 1
    @oblio hey I was just wondering what I should read up on cause I have very little knowledge on passing arguments from a command line etc. For example, the pattern used here only works for 10 documents at a time since you do 01, 02, 03... OR 10, 11, 12... If you're (understandably) bored of explaining, could you give me a couple of pointers on what to look at? I don't understand how he gets from 0[123] to 01, 02, 03, and I can't imagine how it could be extended to ...08,09,10,11... – George Oct 19 '18 at 15:55
  • Well, I'm not 100% sure how the Python glob module works, but I imagine it's inspired by the shell globbing system: http://tldp.org/LDP/abs/html/globbingref.html. Read that, play around with it in the shell (bash, for example), and also read this: https://docs.python.org/2/library/glob.html – oblio Oct 19 '18 at 16:02
  • 1
    @oblio cool, thanks a lot! you see, when you don't know what you don't know, you don't know where to start. I thought it was about getopt! – George Oct 19 '18 at 16:20

1 Answers1

0

Check the quotes. They seem special. Try ' or ", instead.

oblio
  • 1,519
  • 15
  • 39
  • Yeah I just figured out that they had to be " . I guess it was hard to put those in the pdf file... – George Oct 19 '18 at 13:06
  • Sometimes desktop publishing software or word processors converts regular quotes. Good for presentations, bad for code. – oblio Oct 19 '18 at 13:07