2

Ruby 1.9.3, net-ssh 2.9.2

I am working on a project, in which I need to diff the same directory (and its subdirectories) on two different servers (local and remote). From there, I need to copy the newest/recently modified files to the correct server, and delete from the remote if a file is not present in the local.

NOTE: I cannot use rsync. We are backing up Asterisk-related directories to GlusterFS. At thousands of files, rsync comparing local to the Gluster volume is very slow (when we need it under 1 minute).

Here is my current code. I am omitting my work for copying/removing files, as I want to take this one step at a time.

require 'thread'
require 'date'
require 'rubygems'
require 'net/ssh'

SERVERS = ['local17', 'development']
CLIENT = SERVERS[0]
CLIENT_PATH = '/home/hstevens/temp_gfs'
BRICK_PATH = '/export/hunter_test'

@files = {
  SERVERS[0] => {},
  SERVERS[1] => {}
}

def grab_filenames_and_dates(files, server)
  files.reject { |x| File.directory? x }
  files.each do |file|
    name = `ls --full-time "#{file}" | awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print $0}'`.strip
    date = `ls --full-time "#{file}" | awk '{print $6, $7, $8}'`.strip
    @files[server][name] = DateTime.parse(date)
  end
end

# Collect diff information on all servers
ls_threads = SERVERS.map do |server|
  Thread.new do
    if server == CLIENT
      files = Dir.glob("#{CLIENT_PATH}/**/*")
      grab_filenames_and_dates(files, server)
    else
      Net::SSH.start(server, 'hstevens') do |session|
        files = session.exec!(%Q(ruby -e 'puts Dir.glob("#{BRICK_PATH}/**/*")')).split("\n")
        grab_filenames_and_dates(files, server)
      end
    end
  end
end
ls_threads.each(&:join)

When I run my program, it works for the local server (CLIENT/local17), but fails on the remote server. I tried debugging statements (printing pwd to console`, and it appears that although the method is called inside the Net::SSH session block, it is acting on my local server.

ls: cannot access /export/hunter_test/sorttable.js: No such file or directory
ls: cannot access /export/hunter_test/sorttable.js: No such file or directory
./gluster_rsync.rb:36:in `parse': invalid date (ArgumentError)
    from ./gluster_rsync.rb:36:in `block in grab_filenames_and_dates'
    from ./gluster_rsync.rb:33:in `each'
    from ./gluster_rsync.rb:33:in `grab_filenames_and_dates'
    from ./gluster_rsync.rb:53:in `block (3 levels) in <main>'
    from /usr/local/lib/ruby/gems/1.9.1/gems/net-ssh-2.9.2/lib/net/ssh.rb:215:in `start'
    from ./gluster_rsync.rb:51:in `block (2 levels) in <main>'

How can I properly wrap a method call inside a Net::SSH session?

onebree
  • 1,853
  • 1
  • 17
  • 44
  • Why don't you use tools designed for this?[`rsync`](http://linux.die.net/man/1/rsync) can do much, if not all of it, is battle-tested, and is a standard part of *nix systems. – the Tin Man Aug 13 '15 at 20:21
  • I have been getting this a lot on the SO chat. Adding the reason to the question. But basically, rsync + gluster + many files = slow. – onebree Aug 13 '15 at 20:24
  • rsync is going to be a lot faster than what you write in Ruby. rsync is compiled code, written for a specific purpose. If speed isn't important then you can do it in Ruby and it'll do the job. You can also look at [lsyncd](https://code.google.com/p/lsyncd/) which runs in the background watching for file changes and immediately syncs them. I have directories containing 1800+ sub-dirs and 764K+ files and maintained mirrors using lsyncd. It processes in near realtime (syncs within seconds). – the Tin Man Aug 13 '15 at 21:53

3 Answers3

2

I'm 100% NOT trolling you ... but ... your synopsis is the very reason rsync was created. Moving files between servers with diff capability but efficiently.

IMO its a bit misguided to think you can do better than 20 years of battle tested C code. Which FWIW will execute much faster than ruby code. That is probably why so many are rallying to rsync as the solution.

Although rsync is single threaded... ask yourself why that is... just because you can multi-thread in ruby doesn't mean that you should. Its going to open a whole other spaghetti monster you'll soon find yourself tasked with "handling" duplicates or incorrect versions. See MongoDB discussions on atomicity. You won't even get close to atomic in ruby, so it WILL be an issue.

I would be sure to use a thread safe language if you want to go down that route, at the least jRuby. FWIW thread safety was one of the many reason's Jose created Elixir as he was exasperated by ruby not truly having it.

However IMO something is wrong with your approach and you need to take a few steps back and look at the problem holistically, e.g. maybe there is a similar solution to GlusterFS that can handle the dedup on the FS level, or maybe you need to handle file addition through an API or some sort of queuing system that will process the files in a sequential order. It may require a larger change than you're willing or can make though so if it were me, I would be hesitant to just cowboy code something up in ruby, because some developer is going to end up jumping into that code someday and facepalm instantly.

Multithread rsync not ruby

The only solution I can readily come up with is focus on making the rsync transfer faster.

  1. Perhaps you can speed rsync up with threads instead

  2. Or use this person's approach. This does seem to be an issue with GlusterFS but rsync with the proper flag/signals can do the differential sync better. Then your ruby scripts could pick up the files from the master source.

engineerDave
  • 3,887
  • 26
  • 28
  • I know you are not trolling, but I was assigned this task because rsync and GlusterFS, in our case, do not play well together. – onebree Aug 14 '15 at 11:51
  • Thank you for the advice. I upvoted you because I agree, and am loking further into running rsync in threads. I will still pursue my current project, but I want to understand rsync more. – onebree Aug 14 '15 at 13:27
  • 1
    I do feel your pain though re: inheriting projects with crazy parameters from mgmt. good luck! – engineerDave Aug 14 '15 at 16:12
1

Ruby code running inside the net::ssh block still runs on your computer (this includes methods that run commands like system or backticks)

To execute a command on the remote server you need to use session.exec or session.exec! (the latter is blocking, the former requires you to run the ssh event loop). You can also open a channel explicitly and execute a command there - these methods are conscience wrappers.

There is no special support for running ruby remotely. You can of course use exec! to run ruby on the other machine (assuming it is installed) but that's it

Frederick Cheung
  • 83,189
  • 8
  • 152
  • 174
  • This does not answer my question. While SSH'd into another computer, I need to execute a method I defined in my program above. How do I open a channel? The docs (v1 website) are a little confusing to me. – onebree Aug 14 '15 at 11:51
  • Net ssh v1 is ancient. On v2 you open a channel with the open_channel method but like I said you can only execute shell commands on the remote machine, not ruby methods (except in so far as your shell command might be to run ruby) – Frederick Cheung Aug 14 '15 at 12:33
  • v1 docs were better for me, as they were laid out in chapters. For full syntax I read the v2 docs. I understand now -- I did not read it as how you explained it in your comment above – onebree Aug 14 '15 at 13:11
  • Thank you. I accepted you for saying I cannot run methods, only shell commands. – onebree Aug 14 '15 at 13:26
1

The accepted answer helped me arrive to the following solution. Knowing that session.exec!() only runs shell commands, I decided to split the method (see question) into multiple steps within the SSH block.

Thread.new do
  files = nil
  Net::SSH.start(server, 'hstevens') do |session|
    files = session.exec!(%Q(cd "#{BRICK_PATH}" ; ruby -e 'puts Dir.glob("**/*")')).split("\n")
    files.delete_if { |x| File.directory? x }
    files.each do |file|
      name = session.exec!(%Q(ls --full-time "#{BRICK_PATH}/#{file}" | awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print $0}')).strip
      date = session.exec!(%Q(ls --full-time "#{BRICK_PATH}/#{file}" | awk '{print $6, $7, $8}')).strip
      @files[server][name] = DateTime.parse(date)
    end
  end
end

I do not know yet if this proves faster (need to run a benchmark), but it is definitely better than SSH-ing in several system() calls.

Community
  • 1
  • 1
onebree
  • 1,853
  • 1
  • 17
  • 44
  • If speed matters, consider just doing one ls command, not piping it through awk, and then parse it in Ruby to extract the name and date. String#split will probably suffice to replace most of what awk is doing. – Wayne Conrad Aug 14 '15 at 17:37