4

I've written some code in ruby to process items in an array via a threadpool. In the process, I've preallocated a results array which is the same size as the passed-in array. Within the threadpool, I'm assigning items in the preallocated array, but the indexes of those items are guaranteed to be unique. With that in mind, do I need to surround the assignment with a Mutex#synchronize?

Example:

SIZE = 1000000000
def collect_via_threadpool(items, pool_count = 10)
  processed_items = Array.new(items.count, nil)
  index = -1
  length = items.length
  mutex = Mutex.new
  items_mutex = Mutex.new
  [pool_count, length, 50].min.times.collect do
    Thread.start do
        while (i = mutex.synchronize{index = index + 1}) < length do


          processed_items[i] = yield(items[i])
          # ^ do I need to synchronize around this? `processed_items` is preallocated

        end
    end
  end.each(&:join)
  processed_items
end

items = collect_via_threadpool(SIZE.times.to_a, 100) do |item|
  item.to_s
end

raise unless items.size == SIZE

items.each_with_index do |item, index|
  raise unless item.to_i == index
end

puts 'success'

(This test code takes a long time to run, but appears to print 'success' every time.)

It seems like I would want to surround the Array#[]= with Mutex#synchronize just to be safe, but my question is:

Within Ruby's specification is this code defined as safe?

Michael Bishop
  • 4,240
  • 3
  • 37
  • 41
  • It depends on which implementations of Ruby you're going to use. MRI Ruby (classic one) has a Global Interpreter Lock. I don't think Ruby itself places any guarantees on this, though I can't give a good quote on that (that's why it's not an answer). – D-side Oct 28 '14 at 15:31

1 Answers1

1

Nothing in Ruby is specified to be thread safe other than Mutex (and thus anything derived from it). If you want to know if your specific code is thread safe, you'll need to look at how your implementation handles threads and arrays.

For MRI, calling Array.new(n, nil) does actually allocate memory for the entire array, so if your threads are guaranteed to not share indices your code will work. It's as safe as having multiple threads operate on distinct variables without a mutex.

However for other implementations, Array.new(n, nil) might not allocate a whole array, and assigning to indices later could involve reallocations and memory copies, which could break catastrophically.

So while your code may work (in MRI at least), don't rely on it. While we're on the topic, Ruby's threads aren't even specified to actually run in parallel. So if you're trying to avoid mutexes because you think you might see some performance boost, maybe you should rethink your approach.

Max
  • 21,123
  • 5
  • 49
  • 71
  • I'm really wondering about the code in `#collect_via_threadpool` and the block that is passed to it could be doing IO in which case there is a benefit to the threadpool. – Michael Bishop Oct 28 '14 at 16:27
  • My point was that if there is a benefit to the threadpool, it should come from the work the threads are doing, not from writing to a shared array without a mutex. It sounds like in your case you could add a mutex to get _guaranteed_ thread safety with _negligible_ performance penalties. Why not do it? – Max Oct 28 '14 at 16:50
  • No, I definitely think you are right, it's just that the question isn't about "what's the best thing to do in this case?". It's more a general question of whether or not Ruby supports this. It sounds like "no". – Michael Bishop Oct 28 '14 at 17:31