Friday, September 29, 2006

RSS feeds from Ruby

A friend asked me if I knew a good way to create RSS feeds. He want RSS feeds created for some web sites that didn't already have them. My favorit answer: Ruby.

So I created a demo script in Ruby that parses Fatwallet.com and creates an RSS feed for topics that are rated "Better" or higher. The hardest part of this problem was parsing the HTML, but Ruby made that pretty easy. Creating the RSS information was very simple.

The code below has plenty of comments because my friend doesn't know Ruby. His background is more .Net, Java, and Perl.

#  This is a simple example.  It reads the Hot Topics forum
# on Fatwallet.com finding all topics rated "Better" or higher.
# It will print these topics and their URL as it finds them.
# Finally, it prints the RSS feed for this information.
# See: http://www.fatwallet.com/c/18/
#
# Three steps to run this script...
#
# You can get Ruby from http://www.ruby-lang.org/en/downloads/
# I highly recommend the One-click Installer for Windows
#
# This program uses an extra library, Hpricot, to parse HTML.
# To get Hpricot installed, use Ruby's package manager "Gems"
# Just run the follow at your DOS cmd prompt
# gem install hpricot
#
# To run the program from the DOS cmd prompt...
# ruby RubyBot-forRich.rb
#
# RSS feed creation using the RSS library for Ruby
# see: http://www.ruby-doc.org/stdlib/libdoc/rss/rdoc/index.html
# tutorial: http://www.cozmixng.org/~rwiki/?cmd=view;name=RSS+Parser%3A%3ATutorial.en

# "require" is not like "using" in C# nor "#include" in C++.
# require searches the library for the correct ruby file and executes it.
require 'rubygems'
require 'open-uri'
require 'rss/maker'
begin
require 'hpricot'
rescue LoadError # this is like a Try Catch, but there are nifty differences
puts 'Please run "gem install hpricot" before running this program.'
exit
end

BETTER = 4 # a constant

doc = Hpricot(open('http://www.fatwallet.com/c/18/'))

rss = RSS::Maker.make("1.0") { |maker|
#
# Let me explain what is happening...
# The ".make" method created a Maker object, passed it into
# the block (i.e. everything between { and }) as the variable 'maker'.
# Executed the block. Then ".make" returns the RSS object that
# was built while executing the block.
#
maker.channel.about = 'http://www.fatwallet.com/c/18/'
maker.channel.title = 'Fatwallet.com Hot Deals Forum'
maker.channel.description = 'Hot deals rated "Better" or higher.'
maker.channel.link = 'http://www.fatwallet.com/c/18/'

puts 'Searching...' # puts is similar to WriteLine in .Net

(doc/'tr').each { | tr |
#
# (doc/'tr') => performed an XPath search on doc and got a collection of <tr> nodes
# .each => iterates over the collection passing elements one at a time to the "{ |tr| ... }" block
# { | tr | ... } => this is a Closure, the variable 'tr' gets set to the element passed in
#
# similar to C#...
# foreach( Node tr in doc.FindAllNodes( 'tr' ) ) { ... }
# but not really since it is using a closure.
#
# I had to loop on <TR> html tags because I will need to reference this tag later.
#
(tr/'td/img[@title]').each { |img|
if img['title'] =~ /rating: (\d+)/
# $1 is the first group from the match, the rating number
if $1.to_i >= BETTER
(tr/'a[@href]').each { | a |
if a['href'] =~ /^\/t\/18/
puts "http://fatwallet.com#{a['href']} #{a.inner_html}"

item = maker.items.new_item
item.link = "http://fatwallet.com#{a['href']}"
item.title = a.inner_html

end
} # each a
end # if rating >= better
end # if title is rating
} # each img
} # each tr
} # RSS maker

puts "\nRSS..."
puts rss

Long time no post... why?

Two months and nothing posted. The reason is my wife is pregnant with our third child. Just like the first two, the pregnancy is very difficult. So I haven't done much blogging nor extra coding.