51

I have a Python script that process a huge text file (with around 4 millon lines) and writes the data into two separate files.

I have added a print statement, which outputs a string for every line for debugging. I want to know how bad it could be from the performance perspective?

If it is going to very bad, I can remove the debugging line.

Edit

It turns out that having a print statement for every line in a file with 4 million lines is increasing the time way too much.

Sudar
  • 18,954
  • 30
  • 85
  • 131
  • 4
    `timeit` http://docs.python.org/2/library/timeit.html – wim Nov 08 '12 at 11:29
  • It will be slower as you are having to perform a large number of prints, any extra processing is going to incur some performance penalty. – Matt Seymour Nov 08 '12 at 12:09
  • 1
    Send `item` to a socket queue : the program will finish the writes first, and the console from the socket will print the output with a lag. – ajsp Jul 07 '20 at 14:04

2 Answers2

65

Tried doing it in a very simple script just for fun, the difference is quite staggering:

In large.py:

target =  open('target.txt', 'w')

for item in xrange(4000000):
    target.write(str(item)+'\n')
    print item

Timing it:

[gp@imdev1 /tmp]$ time python large.py
real    1m51.690s
user    0m10.531s
sys     0m6.129s

gp@imdev1 /tmp]$ ls -lah target.txt 
-rw-rw-r--. 1 gp gp 30M Nov  8 16:06 target.txt

Now running the same with "print" commented out:

gp@imdev1 /tmp]$ time python large.py 
real    0m2.584s
user    0m2.536s
sys     0m0.040s
vicvicvic
  • 6,025
  • 4
  • 38
  • 55
GSP
  • 1,016
  • 8
  • 10
  • 6
    And when you comment out the write, leave in the print, and run with `> target.txt` ? – Tim Nov 08 '12 at 12:23
  • 1
    @Tim: Oddly enough it worked faster, but it could be my machine is less busier than it was when I ran it earlier, don't have time right now to run it many times to use more sound statistical approach. [gp@imdev1 /tmp]$ time python large.py > target.txt real 0m1.954s user 0m1.897s sys 0m0.049s – GSP Nov 08 '12 at 12:26
  • 10
    redirecting stdout to a file will be much faster, in fact you can direct to a file and open the file in an editor in less time than it takes to spew a large amount of io to the screen. – agentp Nov 08 '12 at 13:01
  • 1
    @GSP Thanks. It looks like, I should remove the print statements. – Sudar Nov 08 '12 at 14:58
  • 1
    I was wondering as well if having a verbose option that checks a bool as to whether or not to print hurts time much. Using `if False: print item` I had it run in **1.417s** and without any print it ran in **1.357s**. – akozi Nov 15 '18 at 16:35
  • 1
    you could also print every 1,000th item, using modulo: `for i,item in enumerate(range(4_000_000)): if i%1_000==0: print(i, item)`, to show progress – ATH Mar 06 '21 at 23:39
6

Yes it affects performance. I wrote a small program to demonstrate-

import time
start_time=time.time()
for i in range(100):
    for j in range(100):
        for k in range(100):
            print(i,j,k)
print(time.time()-start_time)
input()

The time measured was-160.2812204496765 Then I replaced the print statement by pass. The results were shocking. The measured time without print was- 0.26517701148986816.

Kshitij Joshi
  • 137
  • 1
  • 10