System Monitor during process I am a novice when it comes to programming. I've worked through the book Practical Computing for Biologists and am playing around with some slightly more advanced concepts.
I've written a Python (2.7) script which reads in a .fasta file and calculates GC-content. The code is provided below.
The file I'm working with is cumbersome (~ 3.9 Gb), and I was wondering if there's a way to take advantage of multiple processors, or whether it would be worth-while. I have a four-core (hyperthreaded) Intel i-7 2600K processor.
I ran the code and looked at system resources (picture attached) to see what the load on my CPU is. Is this process CPU limited? Is it IO limited? These concepts are pretty new to me. I played around with the multiprocessing module and Pool(), to no avail (probably because my function returns a tuple).
Here's the code:
def GC_calc(InFile):
Iteration = 0
GC = 0
Total = 0
for Line in InFile:
if Line[0] != ">":
GC = GC + Line.count('G') + Line.count('C')
Total = Total + len(Line)
Iteration = Iteration + 1
print Iteration
GCC = 100 * GC / Total
return (GC, Total, GCC)
InFileName = "WS_Genome_v1.fasta"
InFile = open(InFileName, 'r')
results = GC_calc(InFile)
print results