-1

I've created a ruby script that executes fine if I run it from Console.

The script fetches some information from various websites and saves it to my database table.

However, when I want to turn the code into a rake task, the code still runs, but it does not save any new records. I don't get any errors from the rake either.

# Add your own tasks in files placed in lib/tasks ending in .rake,
# for example lib/tasks/capistrano.rake, and they will automatically be           available to Rake.

require File.expand_path('../config/application', __FILE__)

Rails.application.load_tasks

require './crawler2.rb'
task :default => [:crawler]

task :crawler do

### ###

require 'rubygems'
require 'nokogiri'
require 'open-uri'

start = Time.now

$a = 0

sites = ["http://www.nytimes.com","http://www.news.com"]

for $a in 0..sites.size-1

url = sites[$a] 

$i = 75

$error = 0

avoid_these_links = ["/tv", "//www.facebook.com/"]

doc = Nokogiri::HTML(open(url))

    links = doc.css("a")
    hrefs = links.map {|link| link.attribute('href').to_s}.uniq.sort.delete_if {|href| href.empty?}.delete_if {|href| avoid_these_links.any? { |w| href =~ /#{w}/ }}.delete_if {|href| href.size < 10 }

#puts hrefs.length

#puts hrefs

for $i in 0..hrefs.length
    begin

        #puts hrefs[60] #for debugging)

    #file = open(url)
    #doc = Nokogiri::HTML(file) do

        if hrefs[$i].downcase().include? "http://"

            doc = Nokogiri::HTML(open(hrefs[$i]))

        else 

            doc = Nokogiri::HTML(open(url+hrefs[$i]))

        end 

        image = doc.at('meta[property="og:image"]')['content']
        title = doc.at('meta[property="og:title"]')['content']
        article_url = doc.at('meta[property="og:url"]')['content']
        description = doc.at('meta[property="og:description"]')['content']
        category = doc.at('meta[name="keywords"]')['content']

        newspaper_id = 1 


        puts "\n"
        puts $i
        #puts "Image: " + image
        #puts "Title: " + title
        #puts "Url: " + article_url
        #puts "Description: " + description
        puts "Catory: " + category

            Article.create({ 
            :headline => title, 
            :caption => description, 
            :thumbnail_url => image, 
            :category_id => 3, 
            :status => true, 
            :journalist_id => 2, 
            :newspaper_id => newspaper_id, 
            :from_crawler => true,
            :description => description,
            :original_url => article_url}) unless Article.exists?(original_url: article_url)

        $i +=1

        #puts $i #for debugging

        rescue
        #puts "Error here: " + url+hrefs[$i] if $i < hrefs.length
        $i +=1    # do_something_* again, with the next i
        $error +=1

    end 

end

puts "Page: " + url
puts "Articles: " + hrefs.length.to_s
puts "Errors: " + $error.to_s

$a +=1

end

finish = Time.now

diff = ((finish - start)/60).to_s

puts diff + " Minutes"


### ###


end

The code executes fine, if I save the file as crawler.rb and open it in Console by doing --> " load './crawler2.rb' ". When I use the exact same code in a rake task, I get no new records.

  • feels like there's something missing here. the `task :crawler do` is never closed with an `end`. Is the Article creation actually inside the task? Indentation suggests maybe not? – jaydel Aug 22 '16 at 20:16
  • Thanks for the input, but I'm afraid that's not it. I tested out with some print/puts statements and these work perfectly as well. It's like the code just skips over the .create part. I don't know if I use Rake the wrong way or if the syntax is wrong? – Martin Clausen Aug 22 '16 at 20:39
  • Syntax is wrong. a 'do' requires and 'end' somewhere. – jaydel Aug 22 '16 at 21:06
  • I¨ve updated the question. I'm not missing an 'end' :/ – Martin Clausen Aug 22 '16 at 21:26
  • To debug, change `Article.create` to `Article.create!`. It will raise error if something goes wrong. Also, is the DB to which you're writing the same in both cases? – Utsav Kesharwani Aug 22 '16 at 21:36
  • 1
    Please format your code to ruby standards, and eliminate scrolling. – zhon Aug 23 '16 at 02:10
  • 1
    Welcome to Stack Overflow. You can improve your question. Please read [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve). When your code shows your precise problem with nothing extra, you are showing respect to those who volunteer to help you. – zhon Aug 23 '16 at 02:11
  • I've formatted the code accordingly. I tried with Article.create! which does not throw any arrows. It still just runs the scripts and prints the outputs, but no records are created in the databse. – Martin Clausen Aug 23 '16 at 05:22

1 Answers1

0

I figured out what was wrong.

I need to remove:

require './crawler2.rb'
task :default => [:crawler]

and instead edit the following:

task :crawler => :environment do

Now the crawler runs every ten minutes with a bit of help from Heroku scheduler :-)

Thanks for the help guys - and sorry for the bad formatting. Hope this answer may help others.