0

NOTE: I'm choosing to use threading to resolve DNS names, but the same behavior can likely be reproduced with any type of similar operation.


I am receiving unexpected results when I trying to move my (previously working) code from standard single-threaded execution to multi-threading. Specifically, my code iterates over an array of hashes and adds a key/value pair to each hash in the array.

The problem I am having appears to be coming from the dns_cname.map loop where the new key/value pair is being created. Instead of the "external_dns_entry" key having the correct value (that is, result.name.to_s which contains the name resolved by DNS), I am getting a name for one of the many other servers in url_nameserver_mapping instead.

I have a feeling that the DNS resolutions are happening as threads become available and the hash is getting updated out of order, but I don't even know how to begin to track an issue like this down.

Problematic Results: The DNS resolution ran against server1 is mapping to server 17. Likewise, server 17 is mapping to server 99, etc. The rest of the Hash remains in tact.

Any help is GREATLY appreciated. Thanks very much in advance!

Here is my code when multi-threading is NOT enabled (WORKS PROPERLY):

url_nameserver_mapping = { "server1" => "dallasdns.dns.com",
                           "server2" => "portlanddns.dns.com",
                           "server3" => "losangelesdns.dns.com" }


# Parse the JSON string response from the API into a valid Ruby Hash
# The net/http GET request is not shown here for brevity but it was stored in 'response'
unsorted_urls = JSON.parse(response.body)

# Sort (not sure this is relevant)
# I left it since my data is being populated to the Hash incorrectly (w/ threading enabled)
url_properties = unsorted_urls['hostnames']['items'].sort_by { |k| k["server"]}

url_nameserver_mapping.each do |server,location|

      dns = Resolv::DNS.new(:nameserver => ['8.8.8.8'])
      dns_cname = dns.getresources(server, Resolv::DNS::Resource::IN::CNAME)

      dns_cname.map do |result|
         # Create a new key/value for each Hash in url_properties Array
         # Occurs if the server compared matches the value of url['server'] key
         url_properties.each do |url|
           url["external_dns_entry"] = result.name.to_s if url['server'] == server
         end
      end

end

I followed the guide at https://blog.engineyard.cm/2013/ruby-concurrency to implement the producer/consumer threading model.

Here is my adapted code when multi-threading IS enabled (NOT WORKING):

require 'thread'
require 'monitor'

thread_count = 8
threads = Array.new(thread_count)
producer_queue = SizedQueue.new(thread_count)
threads.extend(MonitorMixin)
threads_available = threads.new_cond
sysexit = false

url_nameserver_mapping = { "server1" => "dallasdns.dns.com",
                           "server2" => "portlanddns.dns.com",
                           "server3" => "losangelesdns.dns.com" }


unsorted_urls = JSON.parse(response.body)

url_properties = unsorted_urls['hostnames']['items'].sort_by { |k| k["server"]}

####################
##### Consumer #####
####################

consumer_thread = Thread.new do

  loop do

    break if sysexit && producer_queue.length == 0
    found_index = nil

    threads.synchronize do
      threads_available.wait_while do
        threads.select { |thread| thread.nil? ||
                                  thread.status == false ||
                                  thread["finished"].nil? == false}.length == 0
      end
      # Get the index of the available thread
      found_index = threads.rindex { |thread| thread.nil? ||
                                              thread.status == false ||
                                              thread["finished"].nil? == false }
    end

    @domain = producer_queue.pop

      threads[found_index] = Thread.new(@domain) do

        dns = Resolv::DNS.new(:nameserver => ['8.8.8.8'])
        dns_cname = dns.getresources(@domain, Resolv::DNS::Resource::IN::CNAME)

        dns_cname.map do |result|
           url_properties.each do |url|
             url["external_dns_entry"] = result.name.to_s if url['server'] == @domain
           end
        end

        Thread.current["finished"] = true

        # Notify the consumer that another batch of work has been completed
        threads.synchronize { threads_available.signal }
      end
  end
end

####################
##### Producer #####
####################

producer_thread = Thread.new do

  url_nameserver_mapping.each do |server,location|

    producer_queue << server

    threads.synchronize do
      threads_available.signal
    end
  end
  sysexit = true
end

# Join on both the producer and consumer threads so the main thread doesn't exit
producer_thread.join
consumer_thread.join

# Join on the child processes to allow them to finish
threads.each do |thread|
  thread.join unless thread.nil?
end
Kurt W
  • 321
  • 2
  • 15
  • Please let me know if it would help if I posted the data structure in my `unsorted_urls` Array of Hashes. It is just a bunch of hashes with data that isn't relevant minus a `server` key which has server name value that is used for comparison. I spent over an hour crafting my question and trying to keep it concise--I left it out for brevity. – Kurt W Jun 04 '16 at 00:37
  • Is there a dangling `}` in `url["external_dns_entry"] = result.name.to_s if url['server'] == server }`? – Aetherus Jun 04 '16 at 00:52
  • Yes, thanks for catching that! I was moving that line from a single liner to multiple so folks didn't have to scroll too much. I'll fix it now. – Kurt W Jun 04 '16 at 01:04
  • `dns_cname = dns.getresources(server, Resolv::DNS::Resource::IN::CNAME)` Should it be `dns_cname = dns.getresources(location, Resolv::DNS::Resource::IN::CNAME)`? – Aetherus Jun 04 '16 at 01:23
  • Nope, that part is actually correct, but thanks for looking into this. Right now, I haven't implemented use of the values in that hash--just the keys. I'll add that later. Long story short, the key is being resolved as opposed to the value. – Kurt W Jun 04 '16 at 02:27

1 Answers1

0

@domain is shared by all the threads - this sharing is the root of your problem: when it is updated by popping the next unit of work from the queue, all your threads see that change. You can avoid this problem by doing

Thread.new(producer_queue.pop) do |domain|
   #domain isn't shared with anyone (as long as there
   #is no local variable called domain in the enclosing scope
end

Tangential to your question, but this seems liked a really overengineered approach. Much easier to spin up a bunch of consumer threads ahead of time and have them read directly from the queue of work.

Frederick Cheung
  • 83,189
  • 8
  • 152
  • 174
  • Thanks for this! If the example I pulled from that site is too contrived, do you have a suggestion on how I might do what you said (spin up consumer threads in advance)? Thank you! – Kurt W Jun 04 '16 at 23:08
  • Are you saying I should just do: `threads[found_index] = Thread.new(producer_queue.pop) do` instead of: `threads[found_index] = Thread.new(@domain) do` - - - how are these different? – Kurt W Jun 04 '16 at 23:13
  • The main difference is that instead of your threads sharing @domain (every time a new thread is starting you are changing that value for all threads) my example creates a non-shared local variable. Completely rewriting your threading code is a bit out of scope here and wouldn't help clear up your misunderstanding. – Frederick Cheung Jun 05 '16 at 08:05
  • Thank you. I will try this tomorrow and mark it the correct answer if all works out. Really appreciate you taking the time to help me! – Kurt W Jun 05 '16 at 16:59
  • This was such a simple fix... can't believe I didn't see it. Thanks! – Kurt W Jun 06 '16 at 23:51