-2

I am trying to scrape some data. B is the number of pages required for 1000 items to be scraped - there are 22 per page.

usernum = 1000
b = usernum.to_i/22
Array.new(b) {|b| b+1}

I have an array for the number of pages (1000/22 = 45 int.) - [1,2,...44,45] I need to scrape, with each array element pertaining to a page to be downloaded by Nokogiri. I wasn't sure how to proceed.

www.google.com&page=1
www.google.com&page=2

etc. in this case it would need to reach "www.google.com&page=45"

Is it possible to append the prefix to each item of the array? If so, does it make sense to format the array and download each page chronologically, or compile a list of the URLs in an external text file and load them into a method? I'm going to try and add threading.

sawa
  • 165,429
  • 45
  • 277
  • 381
user2208607
  • 35
  • 2
  • 5

1 Answers1

0

Could you loop through instead of creating an array?

(1..b).each do |page|
  url = "http://google.com?page=#{i}"
  # .. fetch the page
end
ramblex
  • 3,042
  • 21
  • 20
  • Silly me... That's the way to do it! But if I'm going to multithread the scraping, do I not need to put the completed (prefix + numbered) URLs into an array? – user2208607 Mar 25 '13 at 18:14
  • You can create a thread within the each loop if you want a thread for each request. If you want fewer threads you might want to look at `each_slice`. – ramblex Mar 25 '13 at 18:22