-1

I am running a MapReduce job with mrjob library and I want to record the execution time to a json file.

I record the time with this code:

from datetime import datetime
import sys

if __name__ == '__main__':
    start_time = datetime.now()
    MRJobClass.run()
    end_time = datetime.now()
    elapsed_time = end_time - start_time
    sys.stderr.write(elapsed_time)

I have to print the time to stderr because it only works with this method.

I cannot use this code to write to json file because my code will run in distributed mode:

data = {}
data["step1"] = elapsed_time
with open('time.json', 'w') as outfile:
    json.dump(data, outfile)

How can I write the elapsed time to the JSON file in local folder with sys.stderr.write()?

huy
  • 1,648
  • 3
  • 14
  • 40

1 Answers1

1

json.dump writes to a file-like object--meaning it has a .write(str) method. sys.stderr is a file-like object:

from datetime import datetime
import sys
import time
import json

start_time = datetime.now()
time.sleep(1)
elapsed_time = datetime.now() - start_time
data = {"step1":str(elapsed_time)}
json.dump(data,sys.stderr)

Output:

{"step1": "0:00:01.004999"}
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Thanks for your answer, it does write the time into stderr in JSON format. How can I save it to local from this? – huy May 16 '21 at 06:01
  • @huy I'm not sure what you mean. "save to local" is what you were doing before...open a file and write to it. – Mark Tolonen May 16 '21 at 06:02
  • My program will get the input file from HDFS and save the output to HDFS. So it does not use any local file. I want to save the time into a `json` file in the local directory so that other jobs can use this file and continue to add the time. – huy May 16 '21 at 06:04
  • @huy Full disclosure I know nothing about hadoop/mrjob so you'll have to be more specific. But a JSON file is not an appendable unless you use a format like [JSON Lines](https://jsonlines.org/), where each line of the file is a complete JSON object. – Mark Tolonen May 16 '21 at 06:07
  • Anyway thank you for your answer, I think I can find some idea about this. – huy May 16 '21 at 06:09
  • The computer I use to execute this code is called Node Master. In Hadoop distributed mode, when I run a MapReduce code, it will bring this code to many computers and execute the program there so I cannot open and write the JSON file in my Node Master. – huy May 16 '21 at 06:15
  • So using `sys.stderr.write` or your method, I can watch it from the terminal on Node Master. – huy May 16 '21 at 06:16
  • Now I just want to write that into a file on Node Master. – huy May 16 '21 at 06:16