0

I am using java 8 parallel stream to insert data into DB. The following is the code

customers.parallelStream().forEach(t->{

        UserTransaction userTransaction = new UserTransactionImp();
        try {
            userTransaction.begin();
            userTransaction.setTransactionTimeout(300);
            //CODE to write data to DB for each customer in a global transaction using atomikos and hibernate
            userTransaction.commit();
        }catch(Exception e){
            userTransaction.rollback();
        }
    });

it takes more than 2 hours to complete the task.I did the same test running in two different instances(two java main methods).The time taken to complete came down to 1 hour.Is there any other way to scale up within one java intance.I am using Atomikos,Hibernate for persistence.I had configured batching,inserts ordering and update ordering.Evrything is batched properly and working fine . But I observed that the CPU is not utilized more than 30% during this.Is there any way to utilize more Processors and scaling it up .

nanpakal
  • 971
  • 3
  • 18
  • 37
  • Are you using a connection pool so they don't block each other when communicating with the database? Any other shared resource you're using when populating the database that may be causing the threads to block each other? – Raniz Jul 19 '17 at 06:36
  • 1
    You should also be aware that using parallel streams come with some caveats, especially when running I/O code on them. https://dzone.com/articles/think-twice-using-java-8 – Raniz Jul 19 '17 at 06:36
  • Try using a custom ForkJoinPool to run your parallel stream and tweaking the number of threads. – Jure Kolenko Jul 19 '17 at 06:39
  • I used custom ForkJoinPool .Still the same result – nanpakal Jul 19 '17 at 06:41
  • And yes i am using connection pool. – nanpakal Jul 19 '17 at 06:42

1 Answers1

1

parallelStream() basically gives you a "default" implementation. I heard a guy once say: "whenever you use this construct, measure its effects".

In other words: when you are not happy with the default implementation, you might have to look into your own implementation. Not focused on that single operation but the "whole picture".

Example: what if you "badge" together 5, 10, 50 "users" per "shot" - meaning: you reduce the number of transactions, but you allow more content to go into each.

Yes, that is a pretty generic answer - but this is a pretty generic question. We have absolutely no insights what your code is doing there - so nobody here can tell what would be the "perfect" way to reduce overall runtime.

Beyond that: you want to profile your whole setup. Maybe your problem is not the "java" part - but your database. Not enough memory, too much workload ... or network, or, or, or. In other words: first focus on gaining an understanding where your performance bottleneck truly exists.

( a good read about "performance" and bottlenecks: the old classic "Release it" by Michael Nygard )

GhostCat
  • 137,827
  • 25
  • 176
  • 248