-1

I want to pass an array of URLs returned by my first function into of my second function, however I am unsure of how to do this.

require 'open-uri'
require 'nokogiri'
require 'byebug'

def fetch_recipe_urls
  base_url = 'https://cooking.nytimes.com'
  easy_recipe_url = 'https://cooking.nytimes.com/search?q=easy'
  easy_searchpage = Nokogiri::HTML(open(easy_recipe_url))
  recipes = easy_searchpage.search('//article[@class="card recipe-card"]/@data-url')
  recipes_url_array = recipes.map do |recipe|
    uri = URI.parse(recipe.text)
    uri.scheme = "http"
    uri.host = "cooking.nytimes.com"
    uri.query = nil
    uri.to_s
  end

end

def scraper(url)
  html_file = open(url).read
  html_doc = Nokogiri::HTML(html_file)
  recipes = Array.new
  recipe = {
    title: html_doc.css('h1.recipe-title').text.strip,
    time: html_doc.css('span.recipe-yield-value').text.split("servings")[1],
    steps: html_doc.css('ol.recipe-steps').text.split.join(" "),
    ingredients: html_doc.css('ul.recipe-ingredients').text.split.join(" ")
  }

  recipes << recipe
end
the Tin Man
  • 158,662
  • 42
  • 215
  • 303

2 Answers2

1

Since you have an Array after calling fetch_recipe_urls, you can iterate and call scraper for each URL inside:

def scraper(url)
  html_file = open(url).read
  html_doc = Nokogiri::HTML(html_file)

  {
    title: html_doc.css('h1.recipe-title').text.strip,
    time: html_doc.css('span.recipe-yield-value').text.split("servings")[1],
    steps: html_doc.css('ol.recipe-steps').text.split.join(" "),
    ingredients: html_doc.css('ul.recipe-ingredients').text.split.join(" ")
  }
end

fetch_recipe_urls.map { |url| scraper(url) }

But I'd actually structure the code to be something like:

BASE_URL = 'https://cooking.nytimes.com/'

def fetch_recipe_urls
  page = Nokogiri::HTML(open(BASE_URL + 'search?q=easy'))
  recipes = page.search('//article[@class="card recipe-card"]/@data-url')
  recipes.map { |recipe_node| BASE_URL + URI.parse(recipe_node.text).to_s }
end

def scrape(url)
  html_doc = Nokogiri::HTML(open(url).read)

  {
    title: html_doc.css('h1.recipe-title').text.strip,
    time: html_doc.css('span.recipe-yield-value').text.split("servings")[1],
    steps: html_doc.css('ol.recipe-steps').text.split.join(" "),
    ingredients: html_doc.css('ul.recipe-ingredients').text.split.join(" ")
  }
end

fetch_recipe_urls.map { |url| scrape(url) }

You can also call scrape/scraper inside fetch_recipe_urls but I recommend a single responsability approach. A better idea would be to make this OOP and construct a Scraper class and a CookingRecipe to be more idoiomatic.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
radubogdan
  • 2,744
  • 1
  • 19
  • 27
0

Or if you want to pass the array to scraper...

def fetch_recipe_urls
  ...
  recipes = scraper(recipes_url_array)
end


def scraper(urls)
  recipes = []
  urls.each do |url|
    html_file = open(url).read
    html_doc = Nokogiri::HTML(html_file)
    recipe = {
      title: html_doc.css('h1.recipe-title').text.strip,
      time: html_doc.css('span.recipe-yield-value').text.split("servings")[1],
      steps: html_doc.css('ol.recipe-steps').text.split.join(" "),
      ingredients: html_doc.css('ul.recipe-ingredients').text.split.join(" ")
    }
    recipes << recipe
  end
  recipes
end
sam
  • 966
  • 6
  • 10
  • It helps more if you supply an explanation why this is the preferred solution and explain how it works. We want to educate, not just provide code. – the Tin Man Dec 11 '19 at 06:38
  • Thanks for the feedback, @theTinMan. I think you're right. I threw this up here with no explanation. I'm not sure I think it's a better way to do it, just the way that occurred to me. It felt like less refactoring. Plus I'm less familiar with Ruby's mapping. – sam Dec 12 '19 at 11:45
  • You want to be careful doing `css(...).text`. `css`, like `search` and `xpath` returns a NodeSet. Using `text` on a NodeSet concatenates all the text resulting in a mess that's usually impossible to untangle. See https://stackoverflow.com/q/43594656/128421 for more information. – the Tin Man Dec 12 '19 at 21:20