Best clock or number generator function for concurrency/scalability on Erlang OTP 18?

Question

Currently in the process of converting an erlang application from version 17 to 18. Scalability and performance are prime directives in the design. The program needs a way to differentiate and sort new input coming in, either with lots of unique monotonically increasing numbers (a continuous stream of them), or some other mechanism. The current version (17) did not use now() for this because it is a scalability bottleneck (global lock), so it made due with reading the clock and doing other things to generate the tags for the data coming in. I'm trying to figure out the best way to do this in 18 and have some interesting results from the tests I've run.

I expected erlang:unique_integer([monotonic]) to have poor results, because I expected it to have a global lock like now(). I expected one of the clock functions to have the best results, assuming the clock could be read in parallel. Instead, erlang:unique_integer([monotonic]) gets the best results out of all the functions I benchmarked, and the clock functions do worse.

Could someone explain the results, tell me which erlang functions SHOULD give the best results, and which things (clocks, number generators, etc) are or are not globally locked in 18? Also, if you see any issues with my test methodology, by all means point them out.

TEST PLATFORM/METHODOLOGY

windows 7 64 bit
erlang otp 18 (x64)
2 intel cores (celeron 1.8GHz)
2 erlang processes spawned to run each test function concurrently 500000 times
    for a total of 1000000 times, timed with timer:tc
each test run 10 times in succession and all results recorded

BASELINE TEST, SEQENTIAL

erlang:unique_integer([monotonic])
47000-94000

PARALLEL TIMES

erlang:unique_integer([monotonic])
~94000

ets:update_counter
450000-480000

erlang:monotonic_time
202000-218000

erlang:system_time
218000-234000

os:system_time
124000-141000

calendar:universal_time
453000-530000

If you are concern by performance and uniqueness but not by monotonicity or randomness, why don't you use multiple servers, each of them returning a different range of number (for example a sequence of integer of the form LastN + 16, starting the first server with 0, second with 1 and so on with 16 servers)? — Pascal, Aug 19 '15 at 05:58
@hynek, I know, it's written in my comment, but, although it is mentioned in the question as a possible solution, I didn't see monotonicity as a strong requirement. — Pascal, Aug 20 '15 at 03:56

Hynek -Pichi- Vychodil · Accepted Answer · 2015-08-18T15:49:58.650

If you ask about test methodology I would expect you also include your code because there can be a small mistake in benchmark code which could ruin the result. So I write one and made Gist so we can compare result using the same code. YMMV especially because I use Linux and timers strongly depends on underlying OS. There are mine results:

$ uname -a
Linux hynek-notebook 4.1.0-1-amd64 #1 SMP Debian 4.1.3-1 (2015-08-03) x86_64 GNU/Linux
$ grep 'model name' /proc/cpuinfo 
model name      : Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
model name      : Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
model name      : Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
model name      : Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
$ erl
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.0  (abort with ^G)
1> c(test).
{ok,test}
2> test:bench_all(1).
[{unique_monotonic_integer,{38341,39804}},
 {update_counter,{158248,159319}},
 {monotonic_time,{217531,218272}},
 {system_time,{224630,226960}},
 {os_system_time,{53489,53691}},
 {universal_time,{114125,116324}}]
3> test:bench_all(2).
[{unique_monotonic_integer,{40109,40238}},
 {update_counter,{307393,338993}},
 {monotonic_time,{120024,121612}},
 {system_time,{123634,124928}},
 {os_system_time,{29606,29992}},
 {universal_time,{177544,178820}}]
4> test:bench_all(20).
[{unique_monotonic_integer,{23796,26364}},
 {update_counter,{514835,527087}},
 {monotonic_time,{91916,93662}},
 {system_time,{94615,96249}},
 {os_system_time,{27194,27598}},
 {universal_time,{317353,340187}}]
5>

The first thing what I should note, only erlang:unique_integer/0,1 and ets:update_counter/3,4,5 generate unique value. Even erlang:monotonic_time/0 can generate two same timestamps! So if you want unique number you don't have other option than use erlang:unique_integer/0,1. If you want unique monotonic timestamp you can use {erlang:monotonic_time(), erlang:unique_integer()} or if you don't need the time part you can use erlang:unique_integer([monotonic]). If you don't need monotonic and unique, you can use other options. So if you need unique monotonic number there is only one good option and it is erlang:unique_integer([monotonic]).

The second time I should note, spawning two processes is not enough to test scalability. As you can see, when I use os:timestamp/0 with 20 processes, they start catch up with erlang:unique_integer/0,1. And there is another problem. We both use HW with only two CPUs. It is far too few to test scalability. Imagine how would result look on HW with 64 and more cores.

Edit: Using {write_concurrency, true} will improve ets:update_counter but still far beyond erlang:unique_integer/0,1.

2> test:bench(test:update_counter(),1).
{203830,213657}
3> test:bench(test:update_counter(),2).
{129148,140627}
4> test:bench(test:update_counter(),20).
{471858,501198}

Do you have any idea why either 1) unique_integer([monotonic]) performs so well with a global lock, or 2) if it doesn't use a global lock, how it gets unique monotonic integers? Also, I don't suppose reading time should have a lock, but if it doesn't, why is it slower? — user3355020, Aug 18 '15 at 16:48
@user3355020: 1) the problem with a global lock on only two CPUs HW is not so big. 23796 μs (20 processes) means one invocation each 24ns it is about 58 CPU cycles on my HW. It is a reasonable value. 2) reading of timer is slow because it does more work. Try the same test with more CPUs and you will see a difference. Two CPUs is simply too few. Anyway if you need **unique** and **monotonic** number you don't have any other option. — Hynek -Pichi- Vychodil, Aug 18 '15 at 17:07
so you are saying you know for a definite fact that there is a global lock for unique_integer? I am trying to find out this information, and I can't find it anywhere, not even in the documentation, so I am left to speculate and conduct tests with inadequate testing hardware. If you could answer I would appreciate - thanks. — user3355020, Aug 18 '15 at 17:14
https://github.com/erlang/otp/blob/maint/erts/emulator/beam/erl_bif_unique.c#L354-L358 This is the code that does the update for monotonic unique integer. So it is an atomic increment operation. So it will be very fast and scale ok, but when you use more and more cores it will be worse and worse if you have many processes generating integers. — Lukas, Aug 18 '15 at 17:40

score 1 · Answer 2 · answered Aug 19 '15 at 09:02

1

According to erlang code base, erlang:unique_integer([monotonic]) is just increasing atomic integer. This works fast. While this still creates memory barrier, atomic operation is still cheap comparing to conventional global lock approach.

answered Aug 19 '15 at 09:02

Lol4t0

12,444
4
29
65

So it uses some approach which is not a global lock? – user3355020 Aug 19 '15 at 10:22
1

@user3355020 the purpose of resource lock is to acquire unique access to the resource, so process some serie of actions that cannot be interrupted by another resource users. But if you can update resource _atomically_ you just _do not need_ lock. **But** you still have to invalidate cache so other resource users can see your changes. – Lol4t0 Aug 19 '15 at 10:54
@user3355020: From some point of view (SW), it is not a lock. From the other point of view (HW), it is a lock. So it depends on a detail. – Hynek -Pichi- Vychodil Aug 19 '15 at 12:47
@Lol4t0, I know what locks are for, and I know what atomic access is. But atomic access is often implemented with a lock. Locks and updating atomically are not mutually exclusive. – user3355020 Aug 19 '15 at 16:40

Best clock or number generator function for concurrency/scalability on Erlang OTP 18?

2 Answers2