14

Does running multiple Solr shards on a single machine improve performance? I would expect Lucene to be multi-threaded, but it doesn't seem to be using more than a single core on my server with 16 physical cores. I realize this is workload dependent, but any statistics or benchmarks would be very useful!

cberner
  • 3,000
  • 3
  • 22
  • 34
  • 2
    Did you read Hacker News yesterday, by any any chance? http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ – Jesvin Jose Mar 25 '12 at 06:13
  • 3
    Yep, I wrote that :) I was hoping other people had some stats that I could compare with though – cberner Mar 25 '12 at 18:18
  • @cberner Is any of this true for Index performance or is that a completely different animal? I need to update my index frequently with user content and am looking to speed it up. – ted.strauss Nov 21 '12 at 18:00
  • 2
    @ted.strauss I didn't test it with indexing, since we were only indexing tens or hundreds of items per second. My guess would be that indexing is different, and wouldn't benefit, but that's just a guess. However, one thing I found helped a lot with indexing was enabling soft-commits, if you need near real time updates – cberner Nov 21 '12 at 23:25
  • @cberner thanks for your helpful comments. esp since my question is languishing http://stackoverflow.com/q/13500955/241677 – ted.strauss Nov 22 '12 at 12:23

2 Answers2

13

I ran some benchmarks of our search stack, and found that adding more Solr shards (on a single machine, with 16 physical cores) did improve performance up to about 8 shards (where I got a 6.5x speed up). This is on an index with ~1.5million documents, running complex range queries.

So, it seems that Solr doesn't take advantage of multiple physical cores, when running queries against a single index.

cberner
  • 3,000
  • 3
  • 22
  • 34
  • Since your index fits into the I/O cache, sharding improved latency. But this should not be a general advice: what would happen with a bigger index? In a realtime context? And you don't measure throughout, what happens when the concurrency level increases? Could you do your experiments again but with a higher number of threads that send queries to Solr? (20 for example) – jpountz Mar 24 '12 at 23:11
  • I don't know about a bigger index, but for real-time search, I would except indexing performance to improve since the writes would be spread out over multiple shards. I'll try and run some throughput tests, next week. I wouldn't expect there too be too much difference though, since the overhead of sharding was < 20% – cberner Mar 25 '12 at 02:15
0

If you currently have a single box with a single shard, then splitting this shard into several shards:

  • is likely to worsen throughput,
  • may improve latency, by parallelizing query execution.

I can't provide you with statistics or benchmarks because it depends on whether query execution is CPU or I/O bound: if query execution is already I/O bound on a single box then splitting the shard into several shards will even worsen throughput. You will need to test yourself, just take a production log and try to replay it in both scenarii.

jpountz
  • 9,904
  • 1
  • 31
  • 39