Ruby on Rails Feed/RSS Aggregator (35 lines)

I wrote myself a feed aggregator for my front page. And… voila! I’m finally satisfied with it to post it.

Update: I've now published this as a complete standalone rails app on github/sbwoodside/portal. The important bits are app/controllers/portal_controller.rb and config/config.yml.

For me I run this as a standalone rails app, separately from my weblog. You could do that (and redirect requests to / or /index.html with Apache or nginx/etc. Or you could integrate it into your own app. Up to you.

Features:

Will aggregate ANY feed, no matter how badly mangled by the creators, using FeedTools (I also tried feed_normalizer and simple rss but they're not as good)
Deals with slowness of downloading feeds, RSS, etc., and REXML by caching
Deals with need to recache using elegant http/cron periodic system
Display the feeds in a facebook-like news feed format, sorted by dated.
You can easily re-label the feeds, add and renew feeds (in the code)
Only 35 lines of controller code!

The heart of it is the controller, obviously. The best thing? It’s only one page of code! Ruby rocks!

require 'feed_tools'

class PortalController < ApplicationController
  layout 'site'
  # Instructions: 1. Change @@secret. 2. Add a cron job to regularly call /?recache=yes&secret=XXXXXXX
  # This is a feed aggregator that uses FeedTools because it handles practically any feed.
  # But FeedTools is super slow in every way so this aggregator stops using it as soon as possible.
  # TODO add XML feed output
  
  @@secret = "change_this" # change this to protect your site from DoS attack
  # The array of feeds you want to aggregate. If you change this then manually delete the whole cache.
  @@uris = ['http://simonwoodside.com:8080/posts/rss', 'http://simonwoodside.com/comments/rss',
            'http://semacode.com/posts/rss',
            'http://api.flickr.com/services/feeds/photos_public.gne?id=20938094@N00&lang=en-us&format=rss_200',
            'http://api.flickr.com/services/feeds/activity.gne?user_id=20938094@N00']
  # A map between the "official" feed titles in the XML, and the titles you want to show when rendered.
  @@title_map = { "Simon Says" => "Simon Says:", "Simon Says: Comments" => "Simon Says comment:",
                  "Uploads from sbwoodside" => "Flickr picture:", "Semacode" => "Semacode blog post:",
                  'Comments on your photostream and/or sets' => 'Flickr comment:' }
  
  def index
    if params[:recache] and @@secret == params[:secret]
      cache_feeds
      expire_fragment(:controller => 'portal', :action => 'index') # next load of index will re-fragment cache
      render :text => "Done recaching feeds"
    else
      @aggregate = read_cache unless read_fragment({})
    end
  end
  
private
  # This will replace cached feeds in the DB that have the same URI. Be careful not to tie up the DB connection.
  def cache_feeds
    puts "Caching feeds... (can be slow)"
    feeds = @@uris.map do |uri|
      feed = FeedTools::Feed.open( uri )
      { :uri => uri, :title => feed.title, 
        :items => feed.items.map { |item| {:title => item.title, :published => item.published, :link => item.link} } }
    end
    feeds.each { |feed|
      new = CachedFeed.find_or_initialize_by_uri( feed[:uri] )
      new.parsed_feed = feed
      new.save!
    }
  end
  # Make an array of hashes, each hash is { :title, :feed_item }
  def read_cache
    @@uris.map { |uri|
      feed = CachedFeed.find_by_uri( uri ).parsed_feed
      feed[:items].map { |item| {:feed_title => @@title_map[feed[:title]] || feed[:title], :feed_item => item} }
    } .flatten .sort_by { |item| item[:feed_item][:published] } .reverse
  end
end

It’s actually pretty simple but it took me a while to get the balance just right. What you need to do is set up a cron job or other repetitive task that does an HTTP load on http://mywebsite.com/?recache=yes&secret=XXXXXXXX … every once in a while. You can use wget or curl, or whatever. You might want to recache every minute, five minutes, hour, whatever. Since it’s done as a part of the controller there’s no nonsense about running backgroundRB, RubyCron and all the other nonsense at HowToRunBackgroundJobsInRails. Yay!

Here’s the view:


<div id="feed-stream">
  <% cache do %>
    <%
      lastday = -1
      @aggregate.each do |item| %>
        <div class="item">
        <%
          mydate = item[:feed_item][:published].getlocal
          if mydate.yday != lastday
            %><div class="item_details"><p style="text-align:right"><%= mydate.strftime('%A, %B %e') %></p></div><%
            lastday = mydate.yday
          end
        %>
          <div class="item_content">
            <%= item[:feed_title] %>
            <a href="<%= item[:feed_item][:link] %>"><%= item[:feed_item][:title] %></a>
          </div>
        </div>
    <% end %>
  <% end %>
</div>

My cache is all Hashes. I don’t cache the FeedTools object because I discovered that even after FeedTools has parsed your feed, accessing the supposedly “final” data is incredibly slow (like maybe 10x or 100x slower than a hash).

Here’s the model:


require 'feed_tools'
class CachedFeed < ActiveRecord::Base
  validates_presence_of :uri, :parsed_feed
  validates_uniqueness_of :uri
  serialize :parsed_feed, Hash # note that if this exceeds a certain KB size, it will likely fail (thinking it's a String)
end

And the migration:


class CreateCachedFeeds < ActiveRecord::Migration
  def self.up
    create_table :cached_feeds do |t|
      t.column :uri, :string, :limit => 2048
      t.column :parsed_feed, :text, :limit => 128.kilobytes # use for serialized object
      t.timestamps
    end
  end

  def self.down
    drop_table :cached_feeds
  end
end

Well, that’s all you need. When I started out to make this I thought I’d find a simple example out there but there wasn’t anything. It turns out that there’s a number of interesting challenges – picking a parser to deal with difficult feeds, XML, and malformatted XML… to deal with caching … to deal with background processing. Took me a while to get it all just right.

It powers my own front page … consider to be under standard ruby open source license. As the vending machine says: Share And Enjoy!