2

I have a large number of classes that have been generated and annotated using the EMF/Texo combination. I persist them on a SQL Server Database using JPA/Eclipselink.

This works fine, however the performance is horrible when a high number of objects need to be persisted. So I've written two test cases (see TestBulkInserts.java), which compare the performance of a bulk insert using the framework (foo) with a plain JDBC bulk insert (bar).

When inserting 10000 Objects, which is a bulk insert below the average size. foo() and bar() give the following times:

  • Duration JPA/Texo: 19.620ms

  • Duration plain JDBC: 892ms

I'm wondering why there is such a huge difference (more then a factor 20!). With larger sizes it even get's worse.

The DatabaseObject class extends the PersistableObjectClass.java (see below), and both are generated (including the respective DAO class) with Texo + EMF.

I haven't added any particular settings in the persistence.xml, except the necessary connection details.

TestBulkInserts.java:

import java.sql.Connection;
import java.sql.Date;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
...
import com.ownproject.loader.generated.DbModelPackage;
import com.ownproject.loader.DatabaseObject;
import com.ownproject.loader.dao.DatabaseObjectDao;
import javax.persistence.Persistence;
import org.eclipse.emf.texo.server.store.EntityManagerProvider;
import org.junit.Test;

public class TestBulkInserts {

 private static final int NUM_LOOPS = 10000;

 @Test
 public void foo() {
  TestMethods.connectTestDBandEMF();
  // basically does this
  // DbModelPackage.initialize();
  // EntityManagerProvider.getInstance().setEntityManagerFactory(Persistence.createEntityManagerFactory(PERSISTENCE_UNIT_TEST));

  Stopwatch sw = Stopwatch.createStarted();

  DatabaseObjectDao dao = new DatabaseObjectDao();
  dao.getEntityManager().getTransaction().begin();
  for (int i = 0; i < NUM_LOOPS; i++) {
    DatabaseObject dbo = new DatabaseObject();
    dbo.setString(UUID.randomUUID().toString());
    dbo.setInsert_time(Date.valueOf(LocalDate.now()));
    dao.insert(dbo);
  }
  dao.getEntityManager().getTransaction().commit();

  sw.stop();
  System.out.println(String.format("Duration JPA/Texo: %,dms", sw.elapsed(TimeUnit.MILLISECONDS)));
  }

 @Test
 public void bar() throws ClassNotFoundException, SQLException {
  Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");
  String connectionUrl = "jdbc:sqlserver://hostname:1433;databaseName=local_test;user=sa;password=blablub;";
  Connection con = DriverManager.getConnection(connectionUrl);
  con.setAutoCommit(false);

  Stopwatch sw = Stopwatch.createStarted();

  PreparedStatement insertStatement = con.prepareStatement("INSERT INTO DatabaseObject(b_id, insert_time) VALUES (?, ?)");
  for (int i = 0; i < NUM_LOOPS; i++) {
    insertStatement.setString(1, UUID.randomUUID().toString());
    insertStatement.setDate(2, Date.valueOf(LocalDate.now()));
    insertStatement.addBatch();
  }
  insertStatement.executeBatch();
  con.commit();
  con.close();

  sw.stop();
  System.out.println(String.format("Duration plain JDBC: %,dms", sw.elapsed(TimeUnit.MILLISECONDS)));
  }
}

PersistableObjectClass.java:

import javax.persistence.Basic;
...
import javax.persistence.TemporalType;

@Entity(name = "PersistableObjectClass")
@MappedSuperclass()
@Inheritance(strategy = InheritanceType.TABLE_PER_CLASS)
public abstract class PersistableObjectClass {

  @Basic()
  @Temporal(TemporalType.TIMESTAMP)
  private Date insert_time = null;

  @Id()
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private int s_id = 0;

...
}
Lii
  • 11,553
  • 8
  • 64
  • 88
KayleeTheMech
  • 489
  • 1
  • 4
  • 17
  • Read this carefully: https://docs.jboss.org/hibernate/orm/5.2/userguide/html_single/Hibernate_User_Guide.html#batch-session-batch – zloster May 22 '17 at 19:54
  • Thank you for your hint! Did you mean the memory issue? Because flushing didn't reduce the loading time. – KayleeTheMech May 23 '17 at 14:17
  • 2
    JPA adds overhead, so some difference is to be expected, but there are too many variables to explain your numbers. Have you turned logging on to see the SQL that is generated to make sure you are really making apples to apples comparisons? What settings are you using in JPA/EclipseLink? Why are you using Identity for sequence management instead of something that allows preallocation that matches your batch sizes? – Chris May 23 '17 at 14:25

1 Answers1

3

Not only that you need to use batch updates, but you also need to ensure that the transaction is committed at regular intervals, as otherwise, you'd run into a long-running transaction, which is bad on both 2PL or MVCC database engines.

So, this is how you batch job should look like:

int entityCount = 50;
int batchSize = 25;
 
EntityManager entityManager = null;
EntityTransaction transaction = null;
 
try {
    entityManager = entityManagerFactory()
        .createEntityManager();
 
    transaction = entityManager.getTransaction();
    transaction.begin();
 
    for ( int i = 0; i < entityCount; ++i ) {
        if ( i > 0 && i % batchSize == 0 ) {
            entityManager.flush();
            entityManager.clear();
 
            transaction.commit();
            transaction.begin();
        }
 
        Post post = new Post( 
            String.format( "Post %d", i + 1 ) 
        );
        entityManager.persist( post );
    }
 
    transaction.commit();
} catch (RuntimeException e) {
    if ( transaction != null && 
         transaction.isActive()) {
        transaction.rollback();
    }
    throw e;
} finally {
    if (entityManager != null) {
        entityManager.close();
    }
}
Vlad Mihalcea
  • 142,745
  • 71
  • 566
  • 911