0

I am trying to discover text encoded in ASCII found within a DNA sequence from a file.

Below is my code:

The first is to open the FASTA file and set is a variable.

with open("/home/<username>/python/progseq") as mydnaseq:
    sequence = mydnaseq.read().replace('\n','')

This second bit is to translate the sequence into binary and did this for the letters C and G/T to equal 1:

binarysequence = sequence.replace('A','0')

Then I took this loooooong binary sequence and wanted to make it into 8bits:

for i in range(0,len(binarysequence),8):
    binarysequence [i:i+8]

This then created an output like this:

    '00110100'
    '00110010'
    '01000110'
    '00011000'
    '0'

Though I had a much longer output I only included the last four of the sequence.

Wanted to know how to convert this into letters.

  • 1
    You can convert an ASCII string into a binary (bytes) with `sequence.encode()`. The function replaces every character with its 8-bit ASCII charcode. E.g., 'A' becomes 65. But what do you plan to do next with these bits? – DYZ Dec 07 '16 at 01:32
  • Hello @DYZ, thanks for replying. I wanted to replace these series of 8 bits (not just these four) with the respective ASCII character as I am told it is suppose to reveal a poem. I just don't know how yet and was wondering whether to use encode() or decode() or if there was a different way of approaching. I hope I am making sense. I am new to the programming world. – Young Autobot Dec 07 '16 at 16:34
  • I shouldn't have used ord() but translate each letter ([A,C=0][T,G=1]) accordingly. I'm just unsure where to start with this now. – Young Autobot Dec 07 '16 at 16:55

1 Answers1

0

With e.g. a = '10010000', you can do

b = chr(int(a, 2))

to first convert a into an integer based on the binary number and then interpret this integer as a character. Note that many of these 8-bit integers will not result in readable characters!

A concrete example is

b = chr(int('01111000', 2))
print(b)

which result in 'x' being printed.

jmd_dk
  • 12,125
  • 9
  • 63
  • 94
  • Thanks so much for replying. I tried doing b = chr(int(binarysequence,2)) but Python told me it was too large. I'm trying to avoid manually inputting each byte as I have too many and I'm sure there's a simpler way. I'm just unsure what it is yet. Thanks again. – Young Autobot Dec 07 '16 at 18:55
  • What do you mean *manually*? Why not just use `chr(int(binarysequence[i:i+8], 2))` in your loop and maybe append the result to a list as you go? – jmd_dk Dec 07 '16 at 19:04
  • Thanks for your suggestion. It really helped! :) – Young Autobot Dec 07 '16 at 19:14
  • I think I've stumbled upon another issue. It wasn't the output I expected (supposed to be a poem) but it did give me a few letters so I'm thinking of maybe needing to use binascii? but still reading into it. – Young Autobot Dec 07 '16 at 19:39