1

This questions is similar to these:

I have to index certain features from the Wikipedia XML dump. The parsing is fast. However, insert is slow.

Switching off indexing doubled the speed.

I batch insert like this:

    articles.grouped(5000)
           .foreach {
              batch: IterableView[(Article, List[Category], List[Link]), Iterable[_]] =>
                //Save each batch in one transaction
                database withTransaction {
                 implicit session =>
                   for(i <- batch) {
                     articles += i._1
                     categories ++= i._2
                     links ++= i._3
                   }

               }
            }

I read that journal_mode = MEMORY and synchronous = off increase the insert speed. How do I set these with slick? I am using c3p0 as a connection pool and added PRAGMA journal_mode = MEMORY to preferredTestQuery. I don't believe this is the right way to set these options.

Thanks for your help!

Community
  • 1
  • 1
Karsten
  • 882
  • 6
  • 18
  • This is a [FAQ](http://www.sqlite.org/faq.html#q19); you should use one transaction for all inserts. – CL. Dec 17 '13 at 11:54
  • This is what `withTransaction` does or should do. One transaction for all Wikipedia articles might be a little too much. – Karsten Dec 17 '13 at 12:21
  • What do you mean with "too much"? Multiple transaction certainly are too slow … – CL. Dec 17 '13 at 13:17
  • Well, if I add all articles in one transaction I will have millions of inserts which add up to around 40GB. If I cannot switch of journaling this might cause problems. If there is an error I will loose everything instead of "just" 5000 articles. – Karsten Dec 17 '13 at 14:25

2 Answers2

2

It seems like you are reading data once, locally. You could just not use transactions at all and use withSession instead. And if you still need a pragma you can set it via plain SQL. You probably want to reset the pragma after use to not leave a side-effect.

import scala.slick.jdbc.StaticQuery.interpolation

database withSession {
  implicit session =>
  sqlu"PRAGMA synchronous=OFF".execute
  articles.grouped(5000)
         .foreach {
            batch: IterableView[(Article, List[Category], List[Link]), Iterable[_]] =>
              //Save each batch in one transaction
               for(i <- batch) {
                 articles += i._1
                 categories ++= i._2
                 links ++= i._3
               }
           }
}

Also interesting to know is that not only database has a withTransaction method, but also session. So you can do session.withTransaction within a withSession block re-using the same connection.

cvogt
  • 11,260
  • 30
  • 46
  • `sqlu"PRAGMA ...".execute` worked. I used `sql"PRAGMA..."` which did not work. I look into using just one session. – Karsten Dec 17 '13 at 14:35
  • I tested on a subset that includes ca. 5500 articles. switching off journaling and synchronous makes a difference. However, moving the `withSession` outside of the foreach does not make a big difference. I guess that is because there is no indexing or journaling. – Karsten Dec 17 '13 at 15:09
2

In Slick 3.2.0, the following lines seem to work:

import org.sqlite.SQLiteConfig
import slick.jdbc.JdbcBackend.Database
import slick.jdbc.SQLiteProfile.api._
...
val sqliConfig = new SQLiteConfig();
sqliConfig.setJournalMode(SQLiteConfig.JournalMode.MEMORY)
sqliConfig.setSynchronous(SQLiteConfig.SynchronousMode.OFF)
val emailsDB = Database.forURL(
  "jdbc:sqlite:/path/to/my/dbfile.sqlite",
  driver = "org.sqlite.JDBC",
  prop = sqliConfig.toProperties
)

where org.sqlite is from :

libraryDependencies += "org.xerial" % "sqlite-jdbc" % "3.16.1"
amkhlv
  • 307
  • 2
  • 14