0

My program is showing a CPU running time longer than the time the program was actually running, with no parallelization written in the code.

The code is written mostly in Fortran 90 (there's one or two later-Fortran things I added in) and compiled with my Linux machine's native gfortran compiler (--version information: GNU Fortran (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17) ). I understand that gfortran compiles to later standards than 90.

When the program starts it calls call cpu_time(time_start) and before it ended it calls call cpu_time(time_end). In this case, time_end - time_start gives elapsed CPU time in seconds.

So here's the curious thing: I used HTCondor to submit my code to run on whatever machine in my local network had an available CPU. My HTCondor log files show the job was submitted 07/24 14:17:46, started running 15 seconds later, then ran to completion on that same machine, ending 07/30 11:01:52, a clock time of less than six days. However, the time_end - time_start says that the CPU time was 993535 seconds, or over 11 days. My code is not parallelized at all so I do not understand how this can be. How can this be?

I've ran this code hundreds of times before and never noticed this phenomenon, however I've never checked closely either.

Edit: I wish to note once again that my code is not parallelized, at least not explicitly. I do compiled with the -O3 flag, but I don't think this introduces parallelization. If the linked question/answer about parallel Fortran does indeed answer my question about a serial process, please help me understand how because I do not see the connection.

My HTCondor submission script is as follows. I condor_submit this script and that's how I run the code.

executable     = /path/to/executable
universe       = standard
log            = condorlog.log
output         = condorstdout.out
error          = condorerror.out
should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT
queue
NeutronStar
  • 2,057
  • 7
  • 31
  • 49
  • @AlexanderVogt, if you can point out the possible duplicacy I would appreciate it. My code is *not* parallelized, at least not explicitly. – NeutronStar Aug 01 '16 at 16:58
  • *at least not explicitly* what does that mean? – Vladimir F Героям слава Aug 01 '16 at 16:58
  • @VladimirF, I never wrote anything parallel in my code. Perhaps the compiler is doing some parallelization (I use the `-O3` flag) but I don't know, that's part of what I am hoping to have answered here. If the question you linked answers my question, please explain how. I don't see the connection. – NeutronStar Aug 01 '16 at 17:02
  • You should show some code and show how you run it. – Vladimir F Героям слава Aug 01 '16 at 17:03
  • @VladimirF, by this are you saying the explanation probably lies with the way the code is written specifically, rather than a general-level thing that can be answered with what I've put so far? The code is rather long and extracting a MWE may take a long time. – NeutronStar Aug 01 '16 at 17:19
  • 2
    It could be anywhere. In the code. In the way you are compiling. In the way you are running it. MCVE is a must. Does it do this if you run the code locally? Does it do for a simple program over Condor? Have you read the question linked before? It explained that CPU time and wall time are not the same thing. I can't vote on this question any more, otherwise I would vote to close for not having a MCVE. – Vladimir F Героям слава Aug 01 '16 at 17:23
  • 2
    You should first make sure that the behavior is reproducible, which I do not think base on your own observations as you mentioned. If it is not reproducible, it can be anything from your own mistake to a memory corruption and other possibilities. – innoSPG Aug 01 '16 at 18:02

0 Answers0