i am a beginner and want to make a barcode out of this DNA sequence by using pyhton code. it's supposed to read each 1024 nucleotide and checks for mers (a combination of 4 nucleotides i.g. AAAA, AAAC, AAAG, AAAT ..... TTTT). each mer holds an index in an array (size = 256) if it found AAAA within the first 1024 it stores its count in it's index and so on, then to the next 1024 until it's done with the whole sequence. that will create a 2D array which will be turned into a png in gray scale.
my problem is that it took only the first 1024 and displayed it on the entire 1024X256 image.
DNA: https://1drv.ms/f/s!AuXxv7yqjA_FlS_ujYOMUvikWg8E
#read the DNA sequence
fasta_file = open(r'C:path\Escherichia_coli_ATCC_10798.fasta','r')
SE =fasta_file.read()
fasta_file.close()
seq = SE[177:]
dna_sequence = seq.replace("\n","")
# Sample size and mer length
#sample is the window that will go thorugh the whole sequance
sample_size = 1024
mer_length = 4
# Array to store the counts of each mer
barcode = [0] * 256
# Generate all possible 4-mers
mers = []
for i in range(256):
mer = ""
for j in range(4):
mer += "ACGT"[i % 4]
i //= 4
mers.append(mer)
# Loop through the sample and count the occurrences of each mer
for w in range(sample_size):
mer = dna_sequence[w:w+mer_length]
barcode[mers.index(mer)] += 2
# Print the counts of each mer
#print(mers[i], ":", barcode[i])
print(barcode)
# image
# Python program to convert numpy array to image
# import pillow library
from PIL import Image
import numpy as np
# define a main function
# Create the barcode array with the same shape as the desired image
code = np.array(barcode, dtype=np.uint8)
# Create an Image object from the barcode array
image = Image.fromarray(code)
# Reshape the image to the desired size (1024x4000)
image = image.resize((1024, 4000))
# Save the image
image.save('Escherichia_coli.png')
# Display the image (optional)
image.show()
the dark image is what i got the other one is what i was supposed to getenter image description here my output
i don't know how to attach the DNA sequence. some info: genome_id="531534bd23a542ae" atcc_catalog_number="ATCC 10798" species="Escherichia coli"
link to similar genome: