2

I am using Spring/Hibernate done the JPA way using org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean and configured using spring xml, persistence.xml and JPA 2 annotations.

Functionally it is fine and persisting correctly. However, I have a requirement to store entity A which has a bidirectional OneToMany with a large collection of B as quickly as possible.

I am using various options in persistence.xml to try to speed up inserts and reduce memory use (the application writes about as much as it reads)

<property name="hibernate.id.new_generator_mappings" value="true" />
<property name="hibernate.jdbc.batch_size" value="50" />
<property name="hibernate.order_inserts" value="true" />
<property name="hibernate.order_updates" value="true" />
<property name="hibernate.cache.use_query_cache" value="false" />
<property name="hibernate.cache.use_second_level_cache" value="false" />

and the persist is done using

entityManager.persist(instanceOfA)

Edit Additional info:

Each entity has a generated id like this:

@Id
    @Column(name="ID")
    @GeneratedValue(strategy=GenerationType.AUTO, generator="SEQUENCE_GENERATOR")
    @SequenceGenerator(name="SEQUENCE_GENERATOR", sequenceName="MY_SEQUENCE", allocationSize=50)
    private Long id;

which relates to an Oracle sequence

CREATE SEQUENCE MY_SEQUENCE MINVALUE 1 MAXVALUE 999999999999999999999999999 START WITH 1 INCREMENT BY 50 NOCYCLE NOCACHE NOORDER;

When I run the code with show sql on I can see lots of insert statements taking quite a while.

I have read that I need to call entityManager.flush(); entityManager.clear(); every 50 rows inserted.

http://abramsm.wordpress.com/2008/04/23/hibernate-batch-processing-why-you-may-not-be-using-it-even-if-you-think-you-are/

Does this mean that I need to break up the persist into

entityManager.persist(instanceOfA);
instanceOfA.addB(instanceOfB);
entityManager.persist(instanceofB);

adding a flush clear every 50 calls to persist()?

Is there a cleaner way of doing it? (my actual object hierarchy has about 7 layers of relations like A and B)

I was thinking about using JDBC for the inserts, but I hate writing row mappers :)

I have heard about org.hibernate.StatelessSession but there is no method of getting that from a JPA entity manager without casting to SessionFactory at some point - again not very clean.

Thanks in advance!

Matt
  • 557
  • 9
  • 17

2 Answers2

4

I faced the same problem in one of my projects. I was using Hibernate with MySQL backend with an identity ID generator. The issue with that is, Hibernate needs to hit the database once for each entity saved to actually get an ID for it. I switched to the increment generator and saw immediate benefits (all inserts got batched).

@Id
@GeneratedValue(generator = "increment")
@GenericGenerator(name = "increment", strategy = "increment")
@Column(name = "id", nullable = false)
private long id;

The increment generator generates IDs in-memory and does not need to hit the database. I am guessing that the sequence generator also needs to hit the database as it is defined in the database. The con of using increment is, Hibernate should have exclusive insert access to the database and it may fail in a clustered setup.

Another trick I used was to append rewriteBatchedStatements=true to the JDBC URL. This is MySQL specific but I think there may be a similar directive for Oracle.

And that "call flush after every n inserts" trick works too. Here is a sample code to do that (using google-guava classes):

public List<T> saveInBatches(final Iterable<? extends T> entities, final int batchSize) {
    return ImmutableList.copyOf(
        Iterables.concat(
            Iterables.transform(
                Iterables.partition(entities, batchSize),
                new Function<List<? extends T>, Iterable<? extends T>>() {
                    @Override
                    public Iterable<? extends T> apply(final List<? extends T> input) {
                        List<T> saved = save(input); flush(); return saved;
                    }})));
}

public List<T> save(Iterable<? extends T> entities) {
    List<T> result = new ArrayList<T>();
    for (T entity : entities) {
        entityManager.persist(entity);
        result.add(entity);
    }
    return result;
}
Abhinav Sarkar
  • 23,534
  • 11
  • 81
  • 97
2

Use pure JDBC for bulk/large insert. Don't use any ORM Framework for it.

B. S. Rawat
  • 1,874
  • 3
  • 22
  • 34
  • 1
    - Create PreparedStatement for each table insert once and re-use them over and over. 2) Make it Multi-Threaded app. – B. S. Rawat Oct 18 '12 at 21:00
  • I have a similar issue here, had to insert hundred of elements and this seems to be the best aproach, instead of working with many items on memory, just insert the necesessary relation in a plain query. – TheGabiRod Apr 24 '18 at 12:45