I'm trying to build an array where each row contains k-mers (k length nucleotide strings) from a different sequence. I've been reading that you can't really have empty arrays and it's been difficult for me to try using append.
bases = ['A', 'T', 'C', 'G']
self.profile = np.array([])
for x in range(1):
k = self.ksize
kmer = [''.join(p) for p in itertools.product(bases, repeat=k)]
for i in range(0, len(self.motifs)):
for q in range(0, len(kmer)):
if kmer[q] in self.motifs[i]:
self.kmers.append(kmer[q])
self.profile[i] = self.kmers
The error I get here is: "IndexError: index 0 is out of bounds for axis 0 with size 0"
I realize that this is because I did not specify the shape of the array, but I only know the number of rows there will be, I don't know how many columns there will be (column size depends on how many k-mers are found in each sequence).
If I try to make it a 'list of lists':
bases = ['A', 'T', 'C', 'G']
self.profile = list()
for x in range(1):
k = self.ksize
kmer = [''.join(p) for p in itertools.product(bases, repeat=k)]
for i in range(0, len(self.motifs)):
for q in range(0, len(kmer)):
if kmer[q] in self.motifs[i]:
self.kmers.append(kmer[q])
self.profile[i] = self.kmers
I just get: self.profile[i] = self.kmers IndexError: list assignment index out of range
Is there a better way to do this?