0

Here is my basic issue:

I have the following: file name: parseFastq.py execution: via command line code to run it: python3 parseFastq.py --fastq /Users/remaining_dir/test1.fastq

This code works!!!

However, when I copy the components of parseFastq.py issues arise.

Below is the code:

Class is first defined...this part works and runs fine on my new script.

import argparse
import gzip
#Example use is 
# python parseFastq.py --fastq /Users/remaining_dir/test1.fastq

################################################
# You can use this code and put it in your own script
class ParseFastQ(object):
    """Returns a read-by-read fastQ parser analogous to file.readline()"""
    def __init__(self,filePath,headerSymbols=['@','+']):
        """Returns a read-by-read fastQ parser analogous to file.readline().
        Exmpl: parser.__next__()
        -OR-
        Its an iterator so you can do:
        for rec in parser:
            ... do something with rec ...

        rec is tuple: (seqHeader,seqStr,qualHeader,qualStr)
        """
        if filePath.endswith('.gz'):
            self._file = gzip.open(filePath)
        else:
            self._file = open(filePath, 'rU')
        self._currentLineNumber = 0
        self._hdSyms = headerSymbols

    def __iter__(self):
        return self

    def __next__(self):
        """Reads in next element, parses, and does minimal verification.
        Returns: tuple: (seqHeader,seqStr,qualHeader,qualStr)"""
        # ++++ Get Next Four Lines ++++
        elemList = []
        for i in range(4):
            line = self._file.readline()
            self._currentLineNumber += 1 ## increment file position
            if line:
                elemList.append(line.strip('\n'))
            else: 
                elemList.append(None)

        # ++++ Check Lines For Expected Form ++++
        trues = [bool(x) for x in elemList].count(True)
        nones = elemList.count(None)
        # -- Check for acceptable end of file --
        if nones == 4:
            raise StopIteration
        # -- Make sure we got 4 full lines of data --
        assert trues == 4,\
               "** ERROR: It looks like I encountered a premature EOF or empty line.\n\
               Please check FastQ file near line number %s (plus or minus ~4 lines) and try again**" % (self._currentLineNumber)
        # -- Make sure we are in the correct "register" --
        assert elemList[0].startswith(self._hdSyms[0]),\
               "** ERROR: The 1st line in fastq element does not start with '%s'.\n\
               Please check FastQ file near line number %s (plus or minus ~4 lines) and try again**" % (self._hdSyms[0],self._currentLineNumber) 
        assert elemList[2].startswith(self._hdSyms[1]),\
               "** ERROR: The 3rd line in fastq element does not start with '%s'.\n\
               Please check FastQ file near line number %s (plus or minus ~4 lines) and try again**" % (self._hdSyms[1],self._currentLineNumber) 
        # -- Make sure the seq line and qual line have equal lengths --
        assert len(elemList[1]) == len(elemList[3]), "** ERROR: The length of Sequence data and Quality data of the last record aren't equal.\n\
               Please check FastQ file near line number %s (plus or minus ~4 lines) and try again**" % (self._currentLineNumber) 

        # ++++ Return fatsQ data as tuple ++++
        return tuple(elemList)
##########################################################################

This is the code that will not work when calling it in the same script; it has to do with putting the pieces in :

if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Process fasq files and seperaate into 4 categories')
parser.add_argument("-f",  "--fastq", required=True, help="Place fastq inside here")
args = parser.parse_args()

fastqfile = ParseFastQ(args.fastq)

I tried the following and I cannot get fastqfile which should contain a tuple with the following: (seqHeader,seqStr,qualHeader,qualStr)

Attemp:

parser.add_argument("-/Users/remaining_dir/test1.fastq",  "--fastq", required=True, help="Place fastq inside here")

Error:

argument -/Users/remaining_dir/test1.fastq/--fastq: conflicting option string: --fastq

Attemp:

parser.add_argument("-/Users/remaining_dir/test1.fastq",  "-@", required=True, help="Place fastq inside here")

Out[332]:

_StoreAction(option_strings=['-/Users/remaining_dir/test1.fastq', '-@'], dest='/Users/remaining_dir/test1.fastq', nargs=None, const=None, default=None, type=None, choices=None, help='Place fastq inside here', metavar=None)

next line:

Error:

usage:  [-h] -/Users/remaining_dir/test1.fastq
        /USERS/REMAINING_DIR/TEST1.FASTQ
: error: the following arguments are required: -/Users/remaining_dir/test1.fastq/-@
An exception has occurred, use %tb to see the full traceback.

SystemExit: 2

 when %tb selected the following info was give: 
 File "/Users/brownbear/opt/anaconda3/lib/python3.7/argparse.py", line 2508, in error
    self.exit(2, _('%(prog)s: error: %(message)s\n') % args)

  File "/Users/brownbear/opt/anaconda3/lib/python3.7/argparse.py", line 2495, in exit
    _sys.exit(status)

if helpful, I am including some sample fastq data

@seq13534-419
GCAGTAGCGGTCATAAGTGGTACATTACGAGATTCGGAGTACCATAGATTCGCATGAATCCCTGTGGATACGAGAGTGTGAGATATATGTACGCCAATCCAGTGTGATACCCATGAGATTTAGGACCGATGATGGTTGAGGACCAAGGATTGACCCGATGGATGCAGATTTGACCCCAGATAGAATAAATGCGATGAGATGATTTGGCCGATAGATAGATAGTGTCGTGAGGTGACGTCCGTCACTGGACGAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFDFFDFFDDFDFDFFFFDDFFDDFDDFF
@seq86249-867
GGATTAGCGGTCATAAGTCGTACATTACGAGATTCGGAGTACCATAGATTCGCATGAATCCCTGTGGATACGAGAGTGTGAGATATATGTACGCCAATCCAGTGTGATACCCATGAGATTTAGGACCGATGATGGTTGAGGACCAAGGATTGACCCGATGGATGCAGATTTGACCCCAGATAGAATAAATGCGATGAGATGATTTGGCCGATAGATAGATAGAGGTCAGTATAACCTCTCAAAGCTTTATCTACGGATGGATCCGCGC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDFDDDDDDFFDFDDFDDDFDFFDDFFFFFFFFFDDFDFFDDFDDF
@seq46647-928
GACCTAGCGGTCATAAGTGGTACATTACGAGATTCGGAGTACCATAGATTCGCATGAATCCCTGTGGATACGAGAGTGTGAGATATATGTACGCCAATCCAGTGTGATACCCATGAGATTTAGGACCGATGATGGTTGACGACCAAGGATTGACCCGATGGATGCAGATTTGACCCCAGATAGAATAAATGCGATGAGATGATTTGGCCGATAGATAGATAGTAAGTAAATGCCACGGACTCGTCACGTG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDDFDFDFFFFFDFFDFDFDDDDDFDFF

Any help would be appreciated on why this works when I run the script but now when I try and incorporate within a script

  • It's unclear what you mean when you say `This is the code that will not work when calling it in the same script; it has to do with putting the pieces in` and immediately after show examples of trying other things. Do you try the first code snippet? What error or output does it give if any? Is it indented exactly as you have shown? – Axe319 May 27 '20 at 10:02
  • Script A, is original script, and can be called on line command. All Contents of Script A will be copied to Script B. The errors occur when running line by line on Script B. The class runs smoothly; however, when you attempt to utilize agparse to add var-args it fails: – AAA.BioInfo May 27 '20 at 10:05
  • Is the code snippet with `if __name__ == "__main__":` a part of script A? And does it work in Script A if so? – Axe319 May 27 '20 at 10:09
  • @Axe319 it is and it runs fine – AAA.BioInfo May 27 '20 at 10:11
  • The reason this `parser.add_argument("-/Users/remaining_dir/test1.fastq", "--fastq", required=True, help="Place fastq inside here")` won't work is 1. You already have an argument with a `"--fastq"` flag and 2. I'm not sure what this `"-/Users/remaining_dir/test1.fastq"` is trying to do but I'm 99% sure you don't want that. What exactly are you attempting to do with `argparse` in your new script? – Axe319 May 27 '20 at 10:16
  • 1
    Take this working example `parser.add_argument("-f", "--fastq", required=True, help="Place fastq inside here")`. All this is doing is saying that from the command line you can specify either `python my_script.py -f /my/file/path` or `python my_script.py --fastq /my/file/path` and then after you parse the arguments with `args = parser.parse_args()` the `args.fastq` variable will hold whatever string you passed to it. In this example `/my/file/path`. – Axe319 May 27 '20 at 10:29
  • `parser.add_argument("-/Users/remaining_dir/test1.fastq", "--fastq", required=True, help="Place fastq inside here")` This is odd because in order for `args.fastq` to contain `/Users/remaining_dir/test1.fastq` you would need to call `python my_script.py -/Users/remaining_dir/test1.fastq /Users/remaining_dir/test1.fastq` but even that wouldn't work because you already have an argument with `--fastq` as an option earlier in your script. – Axe319 May 27 '20 at 10:33
  • @Axe319 I am trying to feed it two arguments to feed into class=ParseFastQ . 1.) the absolute location of the fasta file I want processed and 2.) letting it know it is a fastq file. Therefore, when I run fastqfile=ParseFastQ(args.fastq), it will open up the Fastq file, sparse the components and return to me a fastq_obj that is a tuple that has length of 4 and corresponds to those 4 lines of a fastq. – AAA.BioInfo May 27 '20 at 10:33
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/214729/discussion-between-aaa-bioinfo-and-axe319). – AAA.BioInfo May 27 '20 at 10:38
  • Your `argparse` description is rather confusing, but the "conflicting option string: --fastq" error arises because you attempted to create a "--fastq" argument twice. – hpaulj May 27 '20 at 17:18

2 Answers2

0

To answer your question to my understanding, you could simply add another argument to the parser like so.

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Process fasq files and seperaate into 4 
            categories')
    parser.add_argument("-f",  "--fastq", required=True, help="Place fastq inside here")
    parser.add_argument("-t",  "--type", required=True, help="The type of file")
    args = parser.parse_args()

    print(args.fastq)
    print(args.type)

And then call it like so.

python3 parseFastq.py --fastq /Users/remaining_dir/test1.fastq --type fastq
Axe319
  • 4,255
  • 3
  • 15
  • 31
0

the solution was two main parts

I was trying to run the argparse via an IDE (Spyder), and running only selected code as opposed to the whole scripts.

For those who are new to python and are utilizing argparse for the first time... this tool only works when calling from the command line.

Therefore, once you've created your args table

you will run as belowL

from command line:

python3 parseFastq.py --fastq test1.fastq 

To break this down further from the initial set up, you are basically labeling your test1.fastq file, to the tag --fastq... this is critical, if you get error that it is required in a particular format is that you have to add them in pairs... in this particular example, you can also label with the short hand of "-f". Therefore, it could also be run as...

from command line:

python3 parseFastq.py -f test1.fastq 

as long as you're py script is run in the same directory as your called files, you do not need the full extension.