EMR cluster running slow

Question

I was running a map reduce Hadoop job on Amazon EMR 5.5.2 which uses Hadoop 2.7.3.

I recently upgraded EMR to 5.12.1 which uses Hadoop 2.8.0.

For the same input load, my new cluster is running comparatively very slow.

I am not able to find out the reason. Maybe I will need to tweak some performance parameters.

Following are the map reduce job counters. Looking at these counters can anybody have any insights on which performance parameters are wrong?

Job Counters

File System Counters    
    FILE: Number of bytes read=1087
    FILE: Number of bytes written=24787084
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=15840
    HDFS: Number of bytes written=0
    HDFS: Number of read operations=132
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=0
    S3N: Number of bytes read=0
    S3N: Number of bytes written=4315
    S3N: Number of read operations=0
    S3N: Number of large read operations=0
    S3N: Number of write operations=0
Job Counters 
    Launched map tasks=132
    Launched reduce tasks=7
    Other local map tasks=132
    Total time spent by all maps in occupied slots (ms)=1576936320
    Total time spent by all reduces in occupied slots (ms)=26894720
    Total time spent by all map tasks (ms)=2463963
    Total time spent by all reduce tasks (ms)=42023
    Total vcore-milliseconds taken by all map tasks=2463963
    Total vcore-milliseconds taken by all reduce tasks=42023
    Total megabyte-milliseconds taken by all map tasks=50461962240
    Total megabyte-milliseconds taken by all reduce tasks=860631040
Map-Reduce Framework
    Map input records=12523
    Map output records=2
    Map output bytes=3236
    Map output materialized bytes=15935
    Input split bytes=15840
    Combine input records=0
    Combine output records=0
    Reduce input groups=1
    Reduce shuffle bytes=15935
    Reduce input records=2
    Reduce output records=8
    Spilled Records=4
    Shuffled Maps =924
    Failed Shuffles=0
    Merged Map outputs=924
    GC time elapsed (ms)=64327
    CPU time spent (ms)=2737480
    Physical memory (bytes) snapshot=166237839360
    Virtual memory (bytes) snapshot=2760473792512
    Total committed heap usage (bytes)=187218526208

What do you mean by "running comparatively very slow"? What type of jobs are you running and how are you measuring the speed (or is just from memory of the old cluster)? Is your data coming from S3 or HDFS? Did you change instance types or the number of nodes? — John Rotenstein, Jun 06 '18 at 23:26

EMR cluster running slow

0 Answers0