1

When testing a method that change embedded soundcloud-track urls to an iframe. I got a bug when escaping the result.

Can someone explain why escaping the +token+ in CASE 2 breaks? Its okay for me to use the result as is and append to it.

The two cases are shown in detail below.

#ruby --version
# => ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12.2.1]

CASE 1

require 'cgi'

class String
  def embedda
    compiled = self
    compiled = soundcloud_replace(compiled)

    return compiled
  end

 private
  def soundcloud_replace(compiled)
    r = /(https?:\/\/(?:www.)?soundcloud.com\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*(?!\/sets(?:\/|$))(?:\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*){1,2}\/?)/i
    compiled.gsub!(r) { |match| soundcloud_player(match) }

    return compiled
  end

  def soundcloud_player(token)
    # return token # => +matched soundcloud url+ in pry
    # puts token   # => +matched soundcloud url+ in pry

    url_encoded_string = CGI::escape(token)

    # puts url_encoded_string # => +escaped matched soundcloud url+

    # "<iframe width=\"100%\" height=\"166\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?url=#{token}\"></iframe>"
    # => Correct interpolated string with non-encoded url

    "<iframe width=\"100%\" height=\"166\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?url=#{url_encoded_string}\"></iframe>"
    # => Correct interpolated string with encoded url
  end
end

CASE 2

require 'cgi'

class String
  def embedda
    compiled = self
    compiled = soundcloud_replace(compiled)

    return compiled
  end

 private
  def soundcloud_replace(compiled)
    compiled.gsub!(/(https?:\/\/(?:www.)?soundcloud.com\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*(?!\/sets(?:\/|$))(?:\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*){1,2}\/?)/i, soundcloud_player("\\1"))

    return compiled
  end

  def soundcloud_player(token)
    # return token # => +matched soundcloud url+  in pry
    # puts token   # => /1                        in pry

    url_encoded_string = CGI::escape(token)

    # puts url_encoded_string = CGI::escape(token) # => %5C1  in pry

    # "<iframe width=\"100%\" height=\"166\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?url=#{token}\"></iframe>"
    # => Correct interpolated string with non-encoded url

    "<iframe width=\"100%\" height=\"166\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?url=#{url_encoded_string}\"></iframe>"
    # => Interpolated string with ...?url=%5C1
  end
end
Takle
  • 13
  • 1
  • 2
  • 1
    I honestly have limited experience with Ruby. But from what I can see, you are passing "\\1" into your soundcloud_player method in Case 2. Does that mean Match Group Number 1? If so, that is incorrect, as a full match is actually 0-based. And because you have no capture groups, \\1 is nothing. – Suamere Apr 09 '13 at 23:52
  • 1
    Side note, try adding the "n" flag (thus making it /in). If every capture-group has (?:) in it, so you don't want to capture any particular group, use the "n" flag and get rid of all that (?:) mess. "n" = explicit capture only. (Will only capture if you explicitly name the group). it's the same as putting (?:) in every capture group as is the case here. – Suamere Apr 09 '13 at 23:55
  • It is indeed a bit strange, you can see the code here: https://github.com/kaspergrubbe/embedda/blob/master/lib/embedda.rb and the specs for it here: https://github.com/kaspergrubbe/embedda/blob/master/spec/embedda_spec.rb it does indeed work with the other regexes. – Kasper Grubbe Apr 09 '13 at 23:59
  • 1
    Aha, then it is exactly as I said. Look up in the other compile calls. When they use \\1, they are calling Capture Group 1 (Just the Digits or ID) part of the match. Try using \\0 if you want the whole match. Edit: Not knowing Ruby, I don't know if \\0 will work, but however you call the result of the entire match is what you want. \\1 calls "Capture Group 1", which doesn't exist in the given regex. – Suamere Apr 10 '13 at 00:05
  • They should function as a backreference in gsub and sub, see http://stackoverflow.com/questions/288573/1-and-1-in-ruby I can't remember where I saw the meaning of prepending another \ sadly. To to be sure I tried with \\0 and it behaves the same way as \\1 as shown in the test CASE 2 except that puts token => /0 I just find it so odd that it actually returns the correct string if I'm not escaping it. I might be easier just to change it to something more readable :) – Takle Apr 10 '13 at 00:13
  • 1
    Right. In the given regex there is no backreferenceable group. So \\1 is certainly wrong. Being that I'm a ruby nub, I couldn't tell you what's right. I do know that this was fun to get into ruby. I'm going to play with it more. – Suamere Apr 10 '13 at 00:17
  • 1
    I believe I was wrong. The entire group is indeed in a capture group with the \(...)\i I was under the impression that those were just the delimiters, but just \...\i is the delimiter, so the ( and ) actually are capturing as group 1. Could you post what the text is that you're running this regex against? That's the next step. – Suamere Apr 10 '13 at 00:23
  • You can see a test on http://rubular.com/r/THYI4Oa49u - I don't think the n-flag is available. I appreciate your input :) – Takle Apr 10 '13 at 00:29
  • I like this syntax better, so I will go with that - but I still think that something is off. Because it only breaks if I try to encode the url. gsub!(pattern) → an_enumerator ; pattern = /my_pattern/ ; output = input.gsub!(awesome_pattern).each { |match| my_method(match) } – Takle Apr 10 '13 at 01:17

2 Answers2

1

In the second case, you are escaping "\1" before giving it to gsub. After escaping "url_encoded_string" contains "%5C1". Since the "\1" is not present, nothing will be inserted. Instead the content will be replaced with the content of "url_encoded_string".

Try the following and see the difference in case 2:

url_encoded_string = token

Instead of:

url_encoded_string = CGI::escape(token)
kaspernj
  • 1,243
  • 11
  • 16
0

Simpler version for showing the issues:

Case 1:

require 'cgi'

def soundcloud_player(token)
  url_encoded_string = CGI::escape(token)
  "<iframe width=\"100%\" height=\"166\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?url=#{url_encoded_string}\"></iframe>"
end

soundcloud_string = "https://soundcloud.com/theeconomist/money-talks-april-8th-2013"
r = /(https?:\/\/(?:www.)?soundcloud.com\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*(?!\/sets(?:\/|$))(?:\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*){1,2}\/?)/i
soundcloud_string.gsub!(r) { |match| soundcloud_player(match) }
puts soundcloud_string.inspect
# => Correct interpolated string with encoded url:
# "<iframe width=\"100%\" height=\"166\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?url=https%3A%2F%2Fsoundcloud.com%2Ftheeconomist%2Fmoney-talks-april-8th-2013\"></iframe>"

Case 2:

require 'cgi'

def soundcloud_player(token)
  url_encoded_string = CGI::escape(token)
  "<iframe width=\"100%\" height=\"166\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?url=#{url_encoded_string}\"></iframe>"
end

soundcloud_string = "https://soundcloud.com/theeconomist/money-talks-april-8th-2013"
soundcloud_string.gsub!(/(https?:\/\/(?:www.)?soundcloud.com\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*(?!\/sets(?:\/|$))(?:\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*){1,2}\/?)/i, soundcloud_player("\\1"))
puts soundcloud_string.inspect
# => Interpolated string with ...?url=%5C1:
# "<iframe width=\"100%\" height=\"166\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?url=%5C1\"></iframe>"
Kasper Grubbe
  • 923
  • 2
  • 14
  • 19