Here's one way:
Create a model such as this:
class Entry < ActiveRecord::Base
attr_accessible :guid, :source_site_id, :url, :title, :summary, :description, :published_at
def self.update_from_feed(feed_name)
feed = Feed.find_by_name(feed_name)
feed_data = Feedjira::Feed.fetch_and_parse(feed.feed_url)
add_entries(feed_data.entries, feed)
end
private
def self.add_entries(entries, feed)
entries.each do |entry|
break if exists? :entry_id => entry.id
create!(
:entry_id => entry.id,
:feed_id => feed.id,
:url => entry.url,
:title => entry.title.sanitize,
:summary => entry.summary.sanitize,
:description => entry.content.sanitize,
:published_at => entry.published
)
end
end
end
end
You can then call this from the cli / cron or whatever with, for example:
rails runner -e development 'Entry.update_from_feed("feedname")'
This runs the update_from_feed method in the context of your Rails app using a separate rails instance (a bit like rails console
), but doesn't impact the running Rails instance.
In this example, there's a separate model which has name and feed_urls, so there's a lookup of the url based on the provided name.
This code doesn't use the ability of Feedjira to check for updates, so dupe checking is baked in.
(This guthub issue says to avoid using the #update method.
Note that the use of break
assumes that new entries are always added to the top of the feed. If you don't trust the feed, then replace break if
with unless
. The url can be used as an alternative unique id.
Edit:
Here's a version of the update_from_feed method that takes advantage of Feedjira's ability to process multiple feeds:
def self.update_all
feed_urls = Feed.pluck :feed_url
feeds = Feedjira::Feed.fetch_and_parse(feed_urls)
feed_urls.each do |feed_url|
feed = Feed.find_by_feed_url(feed_url)
add_entries(feeds[feed_url].entries, feed)
end
end
pluck
returns all the rows of the specified column(s) (:feed_url in this case) in an array. Equally you could change it to accept an array of names, from which it looks up an array of URLs to pass to feedjira.
Finally, if you wanted a self-looping method, you could include:
def self.update_all_periodically(frequency = 15.minutes)
loop do
update_all_from_feed
sleep frequency.to_i
end
end
Then this:
rails runner -e development 'Feed.update_all_periodically'
won't return until you break the process, and will update all feeds at the default frequency, or that specified as an optional argument.
If you wanted to run the updates asynchronously in your main Rails process, then a background worker such as Sidekiq, Resque or DelayedJob will do the... job. :)