Where does sys.stdout.write() go to in MRJOB mapper?

Question

mrjob.conf

runners:
  emr:
    aws_access_key_id: **
    aws_secret_access_key: **
    aws_region: us-east-1
    aws_availability_zone: us-east-1a
    ec2_key_pair: scrapers2
    ec2_key_pair_file: ~/arachnid.pem
    ec2_instance_type: c3.8xlarge
    ec2_master_instance_type: c3.8xlarge
    num_ec2_instances: 3
    python_bin: python2.6
    interpreter: python2.6
    ami_version: 2.4.11
    iam_job_flow_role: EMR_DefaultRole
    jobconf: {"mapred.task.timeout": 600000, "mapred.output.direct.NativeS3FileSystem": false}
    base_tmp_dir: /tmp
    enable_emr_debugging: true
    cmdenv:
        TZ: America/New_York
    s3_log_uri: s3://mrjob-lists/tmp/logs/
    s3_scratch_uri: s3://mrjob-lists/tmp/
    output_dir: s3://mrjob-lists/output
    ssh_tunnel_is_open: true
    ssh_tunnel_to_job_tracker: true

i am using emr to run the job and my mapper task has:

print "test"

as well as

sys.stdout.write("TEst")

However, I cannot find this output in the stdout files on S3. Where is the output written?

score 1 · Answer 1 · answered Apr 03 '15 at 03:48

The mapper stdout for a Hadoop 1 job should appear in the S3 logs under /task-attempts/job_#####_##/attempt_#####_##_##/stdout.gz

It does take a little while for these to push to S3. If you leave the cluster running you can check the Hadoop JobTracker web interface and make sure it appears locally in logs as well just after the job execution.

Where does sys.stdout.write() go to in MRJOB mapper?

1 Answers1