58

Here is the code:

    def readFasta(filename):
        """ Reads a sequence in Fasta format """
        fp = open(filename, 'rb')
        header = ""
        seq = ""
        while True:
            line = fp.readline()
            if (line == ""):
                break
            if (line.startswith('>')):
                header = line[1:].strip()
            else:
                seq = fp.read().replace('\n','')
                seq = seq.replace('\r','')          # for windows
                break
        fp.close()
        return (header, seq)

    FASTAsequence = readFasta("MusChr01.fa")

The error I'm getting is:

TypeError: startswith first arg must be bytes or a tuple of bytes, not str

But the first argument to startswith is supposed to be a string according to the docs... so what is going on?

I'm assuming I'm using at least Python 3 since I'm using the latest version of LiClipse.

icedwater
  • 4,701
  • 3
  • 35
  • 50
user2287873
  • 605
  • 1
  • 6
  • 7

3 Answers3

78

It's because you're opening the file in bytes mode, and so you're calling bytes.startswith() and not str.startswith().

You need to do line.startswith(b'>'), which will make '>' a bytes literal.

TerryA
  • 58,805
  • 11
  • 114
  • 143
  • Ah I added b before all the strings and now it works. Thanks! – user2287873 Nov 07 '13 at 03:48
  • Hmm on a sidenote, seq = fp.read().replace(b'\n',b'') seems to be messing up the stuff that's read. Not sure what's going on but it only seems to be iterating twice (in a 190mb file) and outputting b' each time. – user2287873 Nov 07 '13 at 04:00
  • This is not backwards compatible. – Cerin May 03 '17 at 16:36
  • 1
    The problem is on first argument, not the second, so `line.startswith(b'>')` cannot possibly solve it. `bytes(line).startswith('>')`, on the other hand, could. – mpiskore May 14 '17 at 20:19
  • @mpiskore I used (based on this answer) `line.endswith(b'\n')` and I think it works well. – mirek Nov 01 '22 at 22:41
2

If remaining to open a file in binary, replacing 'STR' to bytes('STR'.encode('utf-8')) works for me.

wenching
  • 51
  • 4
0

Without having your file to test on try encoding to utf-8 on the 'open'

fp = open(filename, 'r', encoding='utf-8')
Andre Odendaal
  • 759
  • 7
  • 7