1

I am getting this error in my mapper class . I am reading big zip file using ZipFileInputFormat that will unzip and using ZipFileRecordReader i am converting it in key as file name and content of the file as value .I have to split the content using my delimiter and insert it into HBase table . Size of the zip file is very huge and its not split able . My code is working for the smaller zip file but when i run this for huge zip file it throw this error . This is where problem occurs.

 // Read the file contents
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        byte[] temp = new byte[8192];
        while ( true )
        {
            int bytesRead = 0;
            try
            {
                bytesRead = zip.read( temp, 0, 8192 );
            }
            catch ( EOFException e )
            {
                if ( ZipFileInputFormat.getLenient() == false )
                    throw e;
                return false;
            }
            if ( bytesRead > 0 )
                bos.write( temp, 0, bytesRead );
            else
                break;
        }

I tried increasing 8192 to some big number but then also same error . This is how i run my mapreduce . hadoop jar bulkupload-1.0-jar-with-dependencies.jar -Dmapreduce.map.memory.mb=8192 -Dmapreduce.map.java.opts=Xmx7372m FinancialLineItem FinancialLineItem sudarshan/output39

In my mapper code i iterate over content of the file then split it and then insert into HBase .

NOTE:File size is very huge .

Sudarshan kumar
  • 1,503
  • 4
  • 36
  • 83

4 Answers4

1

It simply means that the JVM ran out of memory. When this occurs, you basically have 2 choices:

-->Allow the JVM to use more memory using the -Xmx VM argument. For instance, to allow the JVM to use 1 GB (1024 MB) of memory -->Improve/Fix the application so that it uses less memory

ManjuSony
  • 11
  • 1
  • 6
  • I tried with 1024 MB running from argument line ..But same issue .Size of the file in compressed format is 1.2 gb – Sudarshan kumar Oct 14 '16 at 05:22
  • @SUDARSHAN If the file is compressed 1.2GB and has a compression ration of 1:5, then you need a heap with at least 6GB (probably much more since there is other stuff in your heap and ByteArrayOutputStream also doubles the internal byte array when growing - might be better to size it correctly to avoid that). – eckes Oct 17 '20 at 20:46
1

Well, you seem to be reading a large file into memory. You would expect that to cause OOME. You need to stop having all part of the file in memory at once.

Enno Shioji
  • 26,542
  • 13
  • 70
  • 109
  • @SUDARSHAN: First of all you need to stop writing it into a byte array. Does the file have some unit of chunk that can be processed and written? E.g. if it's a text file, one line, or if it's in binary format, one record etc. – Enno Shioji Oct 16 '16 at 09:56
  • 1
    @SUDARSHAN: In that case, use a buffered input stream or something to read line by line, process line by line and write it before you read the next line. – Enno Shioji Oct 16 '16 at 16:44
0

According to error I believe that it's not about the size of the zip file, but about the fact that the uncompressed file is stored into memory. All the data is written into ByteArrayOutputStream which needs to maintain an array of the bytes and when growing, at some time it will run out of memory.

Not familiar with the purpose of the code, but I guess the best solution would be to store it into some temporary file, maybe map into memory and then do some operations on it.

Zbynek Vyskovsky - kvr000
  • 18,186
  • 3
  • 35
  • 43
0

Is your file stores in hdfs ?. If not you can put your file in hdfs and then run a job to simply load and store the contents to some other location. Then you can run job on this new location and old zipped location can be discarded. The file size you are specifying is of zipped file I guess , which after unzipp op will be much larger.

SurjanSRawat
  • 489
  • 1
  • 6
  • 20
  • Yes My files is in HDFS .I am loading the content into temp location .Problem here is its not able to hold one file which is of 1.2 gb uncompressed into temp memory .Also i am running for loop on each line of the content so that also might be that problem .But how to overcome that ? – Sudarshan kumar Oct 15 '16 at 03:31