I need help making a program that creates a text file of randomly sequenced genome that uses the letters 'A' 'C' 'T' and 'G'. The end goal is to produce abut a million randomly sequenced genomes then use another program to search them for known patters that lead to specific diseases. I'd then gather the statistics from my python code and compare them with realistic ones.
-
This is a dupe: https://stackoverflow.com/q/30205962/2988730 – Mad Physicist Apr 03 '18 at 01:38
-
1Not only have you not tried anything, but you didn't even bother googling first. – Mad Physicist Apr 03 '18 at 01:39
1 Answers
If I understand correctly, producing a random human genome would be quite straightforward. The following would produce a random genome of 10 bases (just to show an example):
import numpy as np
random_genome = np.random.choice(list('ACTG'), 10)
>>> random_genome
array(['C', 'A', 'C', 'C', 'G', 'C', 'A', 'C', 'C', 'C'],
dtype='<U1')
You can wrap this in a simple function like this:
def create_genome(n=1000000):
return np.random.choice(list('ACTG'), n)
So that you can define how long you want your genome to be using the argument n
.
As you're looking to streamline your code, I timed the above function with to take approximately 1.2 seconds to create 100 random genomes of 1,000,000 letters each.
EDIT: If your goal is to write to a .txt
file rather than work with your random genomes in python, it might be best to join your genome to a single string first:
def create_genome(n=1000000):
return ''.join(np.random.choice(list('ACTG'), n))
So you can easily and quickly write it to file:
with open('filename.txt', 'w') as f:
f.write(random_genome)
It will take longer to generate a random genome in this way, but if your goal is to have a .txt
, this might be a better saving it as an np.array

- 49,704
- 8
- 81
- 106
-
-
I suppose not, you could use `random`, but I don't see the issue with using `numpy` (and I suppose it's faster). What were you thinking? – sacuL Apr 03 '18 at 01:39
-
1@jhpratt. You don't really *need* numpy for anything. For that matter you don't really need computers at all. But it's really handy sometimes, and makes it worth your time to so the extra import. – Mad Physicist Apr 03 '18 at 01:41
-
That being said, if you're going to use numpy, make sure you're creating an array of one char ascii type – Mad Physicist Apr 03 '18 at 01:42