3

I was looking to implement Python's os.walk method in Crystal. I was trying to do it recursively, but the compiler was told me to be careful about recursive yielding because it recursively/infinitely would generate code when compiled. This is what I had

def walk(d = @root, &block)
  d = Dir.new(d) if d.is_a?(String)
  dirs, files = d.entries.partition { |s| Dir.exists?(File.join(d.path, s)) }
  if Dir.exists?(d.path)
    yield d.path, dirs, files
    dirs.each do |dir_name|
      # recursively yield
      walk File.join(d.path, dir_name), do |a, b, c|
        yield a, b, c
      end
    end
  end
end
mlobl
  • 155
  • 1
  • 11

1 Answers1

5

A few helpful members of the community on Gitter pointed me in the right direction and just wanted to share my learnings here. The answer is, is that you cannot use yield recursively but you must use the block variable instead (explanation to come). Here's what I ended up with:

def walk(d = @root, &block : String, Array(String), Array(String) -> )
  d = Dir.new(d) if d.is_a?(String)
  dirs, files = d.children.partition { |s| Dir.exists?(File.join(d.path, s)) }
  block.call(d.path, dirs, files)
  dirs.each do |dir_name|
    walk File.join(d.path, dir_name), &block
  end
end

The trick here is that instead of using the yield keyword you have to use block.call instead and forward your block. This is actually already in the docs, but it's a little subtle. During compilation, if you have a yield, the compiler will literally inline your block where the yield is (as far as I understand). When using block.call, a function is created instead, and that's the reason we need to type the block argument. If you don't give it a type, block.call will expect 0 arguments. To pass things through, just type it similar to how I did it in this method signature.

Based on the above explanation, it makes sense why you wouldn't need to add a type to block when you just yield and it would just work. It's also important to understand why there's a performance difference between yield and block.call, since in one case a closing function is created versus the compiler inlining your code.

mlobl
  • 155
  • 1
  • 11
  • With `Dir#each_child` instead of `Dir#entries` you don't need to filter out `.` and `..`. Also it's not logical to call `Dir.exists?` after you already have created an iterator for the entries in that path. – Johannes Müller Mar 22 '18 at 20:30
  • Took another look and ```Dir#children``` was a drop in replacement for ```Dir#entries``` - thanks! As for the ```Dir#exists```, i'm not checking for existence but rather is the path a directory. If there's a better way of doing that, let me know, but looking at the code for ```Dir#exists``` it doesn't look like it's doing much other than a wrapper around File.stat and checking if it's a directory. Dunno how I can check if a path is a directory in a cheaper way – mlobl Mar 26 '18 at 15:51
  • You don't need to check if `d.path` exists. If it wouldn't, `Dir.new` had already raised. – Johannes Müller Mar 26 '18 at 15:58
  • Oh, I see what you mean now (was looking at the wrong line). Good point, thanks. Just updated – mlobl Mar 26 '18 at 16:18