3

I am Reading Some Large XML files and Storing them into Database. It is arond 800 mb.

It stores many records and then terminates and gives an exception:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.IdentityHashMap.resize(Unknown Source)
    at java.util.IdentityHashMap.put(Unknown Source)

Using Memory Analyzer i have created .hprof files which says:

  76,581 instances of "java.lang.String", loaded by "<system class loader>" occupy 1,04,34,45,504 (98.76%) bytes. 

Keywords
java.lang.String

I have setters and getters for retrieving values.How do i resolve this issue. Any help would be appreaciated.

enter image description here

I have done with increasing memory through JRE .ini. but problem doesn't solved

EDIT: I am using scireumOpen to read XML files.

Example code i have used:

public void readD() throws Exception {

        XMLReader reader = new XMLReader();

        reader.addHandler("node", new NodeHandler() {

            @Override
            public void process(StructuredNode node) {
                try {



                    obj.setName(node
                            .queryString("name"));

                    save(obj);

                } catch (XPathExpressionException xPathExpressionException) {
                    xPathExpressionException.printStackTrace();
                } catch (Exception exception) {
                    exception.printStackTrace();
                }
            }
        });

        reader.parse(new FileInputStream(
                "C:/Users/some_file.xml"));

    }

    public void save(Reader obj) {

        try {
            EntityTransaction entityTransaction = em.getTransaction();
            entityTransaction.begin();
            Entity e1=new Entity;
            e1.setName(obj.getName());

            em.persist(e1);
            entityTransaction.commit();

        } catch (Exception exception) {
            exception.printStackTrace();
        }
    }
Shiv
  • 4,569
  • 4
  • 25
  • 39
  • 1
    How do you read those XML files ? – AllTooSir Jul 11 '13 at 06:32
  • 1
    Is it possible for you to replace all Strings with StringBuilder(in reading of those XML files)? If yes, then you got the solution – Freak Jul 11 '13 at 06:34
  • @TheNewIdiot using scriumOpen and JPA to store – Shiv Jul 11 '13 at 06:35
  • 1
    @freak This will not really help as a 800M sized XML file will take at least 1.6G in memory (2 bytes per character) without counting any overhead. Even `StringBuilder` will be at its border. – Uwe Plonus Jul 11 '13 at 06:38
  • @UwePlonus yea agreed – Freak Jul 11 '13 at 06:38
  • @eveyone- will String Buffer help???? – Shiv Jul 11 '13 at 06:43
  • @Shiv Yes StringBuffer and StringBuilder both can help but StringBuffer is slow due to thread Safe behaviour.But this suggestion is on second periority.First of all , you need to change your parser , SAX ,stax or JAXB.If problem still persist then think about to change the Strings – Freak Jul 11 '13 at 06:44
  • @Shiv the difference between `StringBuilder` and `StringBuffer` in this case is not existent. – Uwe Plonus Jul 11 '13 at 06:44
  • 1
    @freak,Uwe Plonus,The New Idiot what can be done because i have already increased my jvm or jre memory to 2048 M?? – Shiv Jul 11 '13 at 06:47
  • I am unable to understand that why you are not considering Uwe Plonus's answer.OK Hold on , Just tell us that are you using any parser or Just tell that `How do you read those XML files ?`(the very first comment by The new Idiot) – Freak Jul 11 '13 at 06:51
  • @freak he wrote he using `seriumOpen` or `scriumOpen` ... thing. – user1516873 Jul 11 '13 at 06:55
  • @user1516873 seriumOpen is not a parser.Also it is not in my or even google's knowledge – Freak Jul 11 '13 at 06:58
  • @freak- I am using scireumOpen to read xml files through this link http://java.dzone.com/articles/conveniently-processing-large – Shiv Jul 11 '13 at 07:00
  • @Shiv agreed.It is a parser.and better than SAX too.Now try to replace all the strings and increase the heap size – Freak Jul 11 '13 at 07:06
  • replace all string with what ?? and to which extent i can increase heap size. i have increased aroung 2048m. 64bit os – Shiv Jul 11 '13 at 07:09
  • 1
    @Shiv you don't do something like put everything you parse in one map for subsequent processing? Can you show some code? – user1516873 Jul 11 '13 at 07:15
  • @user1516873- See edited question for code – Shiv Jul 11 '13 at 07:28
  • 1
    @Shiv replace it with StringBuffer or StringBuilder .. Also , I guess 2048 is OK but if still issue persist, then change it to 4096m Or even try with 6000m – Freak Jul 11 '13 at 07:48
  • @freak-increasing heap size is working let me pasre whole xml and will get back to you – Shiv Jul 11 '13 at 08:04
  • @freak-Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded – Shiv Jul 11 '13 at 08:32
  • How much RAM do you have? – Freak Jul 11 '13 at 08:45
  • @Shiv So I guess you need to vote my answer too – Freak Jul 15 '13 at 05:49

8 Answers8

5

Try using another parser for XML processing.

Processing one big XML file with 800M using e.g. DOM is not feasible as it takes up really much memory.

Try using SAX ot StAX in Java and process the parsing results at once without trying to load the complete XML file into memory.

And also don't keep the parsing result in memory in total. Write them as fast as possible into the database and scope your parsing results as narrow as possible.

Perhaps use intermediate tables in database and do the processing part on all datasets inside the database.

Uwe Plonus
  • 9,803
  • 4
  • 41
  • 48
  • @TheNewIdiot - He never even said he was using `DOM`—how do you know this is the right answer? – DaoWen Jul 11 '13 at 06:38
  • @DaoWen I didn't say the correct **answer** , I said the correct **suggestion** . – AllTooSir Jul 11 '13 at 06:39
  • This is exactly the solution. Parse it using SAX, commit the records to database as you go (or in batches of 100, say), and don't keep the whole freakin' file in memory at once when there isn't the slightest need to do so. – Thomas W Jul 11 '13 at 06:41
  • @DaoWen DOM was an assumption from my side, changed answer therefore. The critical point is in the next paragraph: Don't load the complete XML file at once. – Uwe Plonus Jul 11 '13 at 06:41
  • @UwePlonus - He actually never even mentioned parsing—he might just be reading the file in as a string and then dumping it in the database. However, this does seem like the best suggestion since you can _stream_ the input instead of slurping it all into memory. (Based on the "will StringBuffer help?" comment I'm thinking he isn't using a parser.) – DaoWen Jul 11 '13 at 06:43
  • @DaoWen I guess I should believe on this dark reality :D – Freak Jul 11 '13 at 06:48
  • @DaoWen, the OP also didn't say that he did NOT use a SAXParse, so this is the most sensible and obvious suggestion if `-Xmx` doesn't help. – Devolus Jul 11 '13 at 06:51
  • @Devolus - I totally agree — I was just pointing out that the wording of the answer assumes he's using a parser (originally assumed he was using DOM), but that doesn't seem to be the case. I have a feeling he's going to be pretty lost when he goes to look at SAX—but that's OK, he'll learn something hopefully/eventually... – DaoWen Jul 11 '13 at 06:54
  • @DaoWen, Lost or not, if he wants to resolve his problem, he has only a limited number of options. :) It probably means a total redesign of the code. – Devolus Jul 11 '13 at 06:58
  • I am using scireumOpen to read xml files through this link java.dzone.com/articles/conveniently-processing-large – Shiv Jul 11 '13 at 07:03
  • @Shiv it looks like the article of dzone points into the right direction but you're selecting the wrong nodes. Perhaps you have to write your own `SAXHandler` or use `StAX` to solve your problem. – Uwe Plonus Jul 11 '13 at 07:15
2

Your heap is not limited and cannot hold such a big xml in memory. Try to increase the heap size using -Xmx JRE options.

or

try to use http://vtd-xml.sourceforge.net/ for faster and lighter xml processing.

Juned Ahsan
  • 67,789
  • 12
  • 98
  • 136
1
  1. The most obvious answer, increase your JVM memory, as already has been mentioned here, using java -XmxNN
  2. Use a SAXParser instead of a DOM Tree (if you don't do this already). This depends on your application design, so you have to look into it and see if this is a possible strategy.
  3. Check your code and try to remove all objects which are not needed, so that they can removed from the GB. This can include i.e. moving variables inside a loop instead of having them outside of it, so that the references are removed early. Setting unused elements to null after you no longer need them.

Without knowing your code, this are only general guidlines.

Devolus
  • 21,661
  • 13
  • 66
  • 113
  • Possible. depends on how you access it. With a stringbuffer, you can at least pass around the same object, so you might create lesser references. Problem with strings is that they are immutable, and in this regard a StringBuffer can really help. – Devolus Jul 11 '13 at 06:45
1

My main tip: check your JPA code once again. Should be as isolated as possible.

An idea would be to use JAXB with annotations. An IdentityHashMap (keys use == instead of equals) is a rare thing, likely JPA, maybe XML tags? You could also look at which XML parser is used (inspect the factory class, or list all XML parser providers by the java SPI, service provider interface).

You could share strings, for instance all strings with length lesser 20. Using a Map<String, String>.

private Map<String, String> sharedStrings = new HashMap<>();

private String shareString(String s) {
    if (s == null || s.length() > 20) {
        return s;
    }
    String t = sharedStrings.get(s);
    if (t == null) {
        t = s;
        sharedStrings.put(t, t);
    }
    return t;
}

public void setXxx(String xxx) {
    this.xxx = sharedString(xxx);
}

You could use compression (GZip streams) for larger texts in the beans.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
1

Don't use String if you are using.Replace it with StringBuffer or StringBuilder .Also, try to increase the memory.I guess 2048 is OK but if still issue persist, then change it to 4096m Or even try with 6000m

Freak
  • 6,786
  • 5
  • 36
  • 54
0

You can increase your heap size when you launch Java:

java -Xmx8G
DaoWen
  • 32,589
  • 6
  • 74
  • 101
  • @TheNewIdiot - Actually, if his default heap size is too small then this _is_ the right answer and it _is_ guaranteed to work as long as he picks a large enough heap size. You're comment would make just as little sense in reply to "oh, I am out of disk space—I should buy a second hard drive." It's only a temporary fix—I might run out of space again... – DaoWen Jul 11 '13 at 06:36
  • The files he read need not be limited to what RAM size he has and even we don't know if he can afford 8G . – AllTooSir Jul 11 '13 at 06:38
  • @DaoWen when he is parsing already an XML file with 800M it could grow larger in future and he will have the same problem again. – Uwe Plonus Jul 11 '13 at 06:43
  • @UwePlonus - I agree that a solution where you can stream the data would be much more elegant—but it would probably also require him to restructure a lot of code. If he can fix this now by throwing more RAM at it and then take his time to re-architect it later then it's probably a good idea to throw more RAM at it. – DaoWen Jul 11 '13 at 06:47
0

Looks like you edit code before post it, or post not exactly right code. Please correct it.

First, your code will not compiles.

Second, not pass Reader in save function. Create and fill Entity in process(StructuredNode node) and pass Entity, not Reader, to save function.

Third, correctly handle Exception in save function. If exception occurs, rollback transaction.

user1516873
  • 5,060
  • 2
  • 37
  • 56
  • yes sir i have done exactly as you are telling obj is an object of entity class. i setted entity first and passing it to save its just that i have edited code for idea – Shiv Jul 11 '13 at 08:02
0

Finally i have solved my problem. Following things helped:

1. Heap Size 2048 is eough.

2. Another problem was that i was using String.

and String object is immutable

By immutable, we mean that the value stored in the String object cannot be changed. Then the next question that comes to our mind is “If String is immutable then how am I able to change the contents of the object whenever I wish to?” . Well, to be precise it’s not the same String object that reflects the changes you do. Internally a new String object is created to do the changes.

refer Difference between string and stringbuffer, Stringbuilder

So i removed getters and setters for entities other than JPA Entities. And inserted all data directly to Database without setting them to any objects.

3. the third and the Main problem was JPAEntityManager.

My code didn't ensure the EntityManager is always closed when the method finishes. As far as a RuntimeException occurs in the business logic, the em EntityManager remains open!

So Always close this and also you can set your objects to null in finally block like

finally {
                    Obj1 = null;
                    Obj2 = null;
                    if (entityTransaction.isActive())
                        entityTransaction.rollback();
                    em.clear();
                    em.close();

                }

refer How to close a JPA EntityManger in web applications

+1 for every Answer guys it helped me a lot. i am not marking any answer because i thought of posting the complete answer for it.Thanx

Shiv
  • 4,569
  • 4
  • 25
  • 39