11

In my grails (1.3.7) application, I am using JDBC Template for bulk import of 1000s of records from a CSV file (since it's much faster than using vanilla GORM/hibernate, as you would expect).

e.g.

class Book {
    String title
}

and

// For each CSV record...
insertStatementList.add("insert into book (id, title) values (nextval('hibernate_sequence'), thetitle)")
...
JdbcTemplate bulkInsert = ...
bulkInsert.batchUpdate(insertStatementList)

The problem with this approach is that a change in the domain class (e.g. adding a subject attribute) requires a change to both the domain class AND the SQL insert statement.

Since the GORM/hibernate stack must ultimately be deriving SQL from the domain class definition, is there a way to access this functionality so that I don't have to maintain separately the SQL insert statement? Or in pseudo-code, is something like the following possible:

// would return something like:
// "insert into book (id, title) values (nextval('hibernate_sequence'), 'thetitle')"
def insertStatement = hibernate.getSqlInsertForClass(Book, book.properties)
Jon Burgess
  • 2,035
  • 2
  • 17
  • 27
  • I don't what's your optimization requirement, but you can still use hibernate, keep the flexibility of the gorm mapping and get amazing performance. Consider this: http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql/ --- As for gettting the hibernate insert statement, I don't think they expose an api for that, so you'd have to intercept the log or jdbc driver. Quite a hassle. – Raphael Sep 24 '12 at 00:52
  • @Raphael I have tried the approach in the link (flushing hibernate cache etc) but I have found that using JDBC template compared to GORM is about 3x as fast, with my profiler showing that most of the extra time is spent in construction of GORM (and associated) objects. – Jon Burgess Sep 24 '12 at 03:26
  • I should add that using the approach in the link did provide a significant speed up in the first place - but still not as good as JDBC template. – Jon Burgess Sep 24 '12 at 03:30
  • You cloud try to build your sql statement using the persistent properties mapped from the domain. This method returns such list: _new DefaultGrailsDomainClass(YourDomainClazz).persistantProperties_ – Raphael Sep 24 '12 at 18:04
  • @Raphael thanks for the suggestions - but I don't think it really answers my question. Yes, I could write some code to build the SQL dynamically from the domain class def, but my point is that hibernate must already be doing this (not publicly accessible perhaps, but nevertheless, it's happening) – Jon Burgess Sep 28 '12 at 06:56

1 Answers1

1

I don't know grails enough to say if that's possible within it. I tried generating the SQL insert fields by listing the class properties and composing it dynamically, but it gets out of order.

You could create an annotation to define the position of that field in your CSV:

import java.lang.annotation.*

@Retention(RetentionPolicy.RUNTIME) @interface CSV { int position() }

class Product {
  @CSV(position=1) String description
  @CSV(position=3) BigDecimal price
  @CSV(position=4) Integer type
  @CSV(position=2) boolean soldout
}

If you need multiple mappings (to support older CSVs of your own, for example), you should consider a map structure or a XML, which would be detached from the entity.

Then you need to iterate the fields to compose the query:

def csv = '''rice;10.0;3;false
beet;12.0;2;false
mango;22.0;2;true'''

def properties = Product.declaredFields
  .findAll { it.declaredAnnotations }
  .sort { it.declaredAnnotations[0].position() }

def templateQuery = "INSERT INTO product(#fields) VALUES (#values)"

csv.eachLine { line ->
  def fields = line.split( /;/ )
  def sqlFields = [:]

  fields.eachWithIndex { field, i ->
    sqlFields[properties[i].name] = field
  }

  println templateQuery
    .replace('#fields', sqlFields.keySet().join(","))
    .replace('#values', sqlFields.values().join(","))
}

Which prints:

INSERT INTO product(description,price,type,soldout) VALUES (rice,10.0,3,false)
INSERT INTO product(description,price,type,soldout) VALUES (beet,12.0,2,false)
INSERT INTO product(description,price,type,soldout) VALUES (mango,22.0,2,true)

It is pretty raw and need some polishing, like quotes and some stuff against sql injection, but it works, more like a proof of concept.

In an older system i worked, we used jboss seam and on those bulk stuff we worked with jpa batch, i.e. manually flushing when all the persist() were done. Are you sure there isn't anything like that on grails to use with gorm?

This link shows a blog post of using a batch update within grails, using the withTransaction whilst regularly applying a session clear:

    List <Person> batch =[]
    (0..50000).each{
       Person person= new Person(....)
        batch.add(person)
        println "Created:::::"+it
        if(batch.size()>1000){
            Person.withTransaction{
                for(Person p in batch){
                    p.save()
                }
            }
            batch.clear()
        }
      session = sessionFactory.getCurrentSession()
      session.clear()             
    }

You sure that doesn't work? If not, then the annotation might be a solution.

Also there's the problem with the column naming, due to the difference between java's camel case and the underscore in db. The guy behind that translation, in hibernate, is the ImprovedNamingStrategy. Maybe you can get something from him. Or adding the column name in you @CSV annotation. Sounds like recycling JPA :-).

There is also log4jdbc, but i think it won't solve your problem: you'd need to sneak into hibernate sql generation.

Will
  • 14,348
  • 1
  • 42
  • 44