4

I have the following function below that will normally spit out a URL such as path.com/p/12345.

Sometimes, when a tweet contains a colon before the tweet such as

RT: Something path.com/p/123

the function will return:

personName:
path.com/p/12345

My function:

$a = 10

def grabTweets()
  tweet = Twitter.search("[pic] "+" path.com/p/", :rpp => $a, :result_type => "recent").map do |status|
    tweet = "#{status.text}" #class = string
    urls = URI::extract(tweet) #returns an array of strings
  end
end

My goal is to find any tweet with a colon before the URL and remove that result from the loop so that it is not returned to the array that is created.

Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
Zack Shapiro
  • 6,648
  • 17
  • 83
  • 151

1 Answers1

3

You can only select HTTP URLs:

URI.extract("RT: Something http://path.com/p/123")
  # => ["RT:", "http://path.com/p/123"]

URI.extract("RT: Something http://path.com/p/123", "http")
  # => ["http://path.com/p/123"]

Your method can also be cleaned up quite a bit, you have a lot of superfluous local variables:

def grabTweets
  Twitter.search("[pic] "+" path.com/p/", :rpp => $a, :result_type => "recent").map do |status|
    URI.extract(status.text, "http")
  end
end

I also want to strongly discourage your use of a global variable ($a).

Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
  • So just to clarify, by adding the ', "http"' after the url extract. that validates that the url it's extracting has http in it before it adds to the array. Is that correct? – Zack Shapiro Feb 04 '12 at 03:15
  • 2
    Yup, though it checks that it's "scheme" is what you're passing, so it's not just that it's anywhere in the URI. You can also pass it an array of protocols, e.g. `["http", "ftp"]` to include multiple schemes. I'd normally say you could read more in [the documentation](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/URI.html#method-c-extract), but it's frustratingly brief. – Andrew Marshall Feb 04 '12 at 03:27