4

I am currently building my first REST API, based around an RSS aggregator. I have it implemented using one of two traits, either a MemoryBasedDB or PostgresDB. On each access to the root url, it will make an asynchronous call out to the feed to grab the newest articles, and return that as an XML string for parsing. After parsing, it is persisted in the database as an Article object.

Functionally, it's fine either way for me. However, when load testing with weighttp or gatling it will fail under 1k requests/1k concurent users using Postgres with the following:

In weighttp:

error: read() failed: Connection reset by peer (104)

And in my server logs:

final [WARN] [09/21/2014 14:45:27.224] [on-spray-can-akka.actor.default-dispatcher-36] [akka://on-spray-can/user/IO-HTTP/listener-0/523] Configured registration timeout of 1 second expired, stopping

I believe it has something to do with the way my queries are laid out. They are blocking and as each actor has to wait for a response, the load behind them piles up higher and higher to the point of failure (timeout). However, in my research I have only been able to find this asynchronous driver for postgres which is currently incompatible with Squeryl (to my understanding).

How can I make DB access faster? Currently I am achieving ~10-15req/s using Postgres, and ~400req/s using in-memory persistence.

My model:

case class Article(id: Option[String], idint: Option[Int], title: String, author: String, published: String, updated: String, `abstract`: Option[String], content: Option[String], link: Option[String])

My queries:

trait PostgresDB extends Schema {

  val articles = table[Article]("articles")
  on(articles)(e => declare(e.idint is(unique)))

  def create(x: Article) = inTransaction {
      articles.insert(x)
  }

  def getAll: Set[Article] = inTransaction {
      from(articles)(article => select(article)).toSet
  }

  def getArticle(x: Int) = inTransaction {
      from(articles)(article => where(article.idint === Some(x)) select(article)).toList(0)
  }

  def printy = transaction {
      articles.schema.printDdl(println(_))
  }
}

So far I have tried:

  • Implementing C3P0 for connection pooling. No real change.
  • Adjusting postgresql.conf for performance. Small positive change.
  • Adjusting application.conf for spray/akka for performance. Small positive change.

Relevant info:

  • Kernel:
    • Linux 3.13.0-33-generic #58-Ubuntu SMP Tue Jul 29 16:45:05 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
  • Postgres 9.3
  • Scala 2.10.4
  • Spray 1.3.1
  • Akka 2.3.5
Joseph Sawyer
  • 463
  • 3
  • 11

3 Answers3

3

Hopefully each REST actor is not blocking on its own db request... They delegate to a separate pool of db actors with persistent connections?

experquisite
  • 879
  • 5
  • 14
3

Yes I agree with @experquiste, give the db actors their own despatcher and tune its thread pool size and number of actors to the number of concurrent requests that your db can handle. Place a router in front of this. You should measurer the database servers disk queue length. This should be stable under sustained high load, keep adding threads until you the queue starts to grow.

Another approach is to use a thread pool and futures for your db access layer. It seems to be easier to configure, but lacks supervision and error recovery. http://www.chrisstucchio.com/blog/2013/actors_vs_futures.html personally I still use actors for concurrency.

I have never used squeryl, do the inTransaction blocks create db transactions? The database trait you show doesn't seem to need transactions, have you tried without them.

iain
  • 10,798
  • 3
  • 37
  • 41
  • Yes, it creates a new transaction if there isn't one currently executing - - http://squeryl.org/sessions-and-tx.html. I had come across that article before, but at the time couldn't really reason about futures very well. I am now making use of them elsewhere in the same program, so I will give it a try here. – Joseph Sawyer Sep 22 '14 at 02:16
  • Using Futures to manage the calls has helped tremendously - - I am now averaging ~300req/s using the disk based DB. Thank you for your help! – Joseph Sawyer Sep 23 '14 at 01:33
  • Great. DB throughput of 3/4 the of pure in-memory impl is very good. – iain Sep 25 '14 at 10:39
0

I'm not an expert in these questions, but doesn't it seem reasonable to split performance testing into different areas? For example, on how performant spray is and how performant is squeryl.

Side notes. A value of "~10-15req/s" is very, very low. Shouldn't be like that..

Another note. As far as I remember, C3P0 should give a real difference. Are you sure you set it up correctly?

I also would suggest to be careful with async code and thread pools, avoiding them if they are not necessary. They make the code more complex and open a whole new area for bugs. (Async is still cool in some use cases -- but I hope I made my point clear already.)

VasiliNovikov
  • 9,681
  • 4
  • 44
  • 62
  • Ah, remembered some numbers. As far as I remember, a simple set-up (with C3P0) took around 2-4ms for a simple request and h2 in embedded mode. And it's around 6-10ms with h2 not embedded. The response times were not affected much with the overall load, until the load got the the possible maximum, where response times did in fact make some hell and start to be very big. Anyway, I did not dig into this very deep -- just some basic testing as an experiment. – VasiliNovikov Sep 29 '14 at 17:44