So I created a demo script in Ruby that parses Fatwallet.com and creates an RSS feed for topics that are rated "Better" or higher. The hardest part of this problem was parsing the HTML, but Ruby made that pretty easy. Creating the RSS information was very simple.
The code below has plenty of comments because my friend doesn't know Ruby. His background is more .Net, Java, and Perl.
# This is a simple example. It reads the Hot Topics forum
# on Fatwallet.com finding all topics rated "Better" or higher.
# It will print these topics and their URL as it finds them.
# Finally, it prints the RSS feed for this information.
# See: http://www.fatwallet.com/c/18/
#
# Three steps to run this script...
#
# You can get Ruby from http://www.ruby-lang.org/en/downloads/
# I highly recommend the One-click Installer for Windows
#
# This program uses an extra library, Hpricot, to parse HTML.
# To get Hpricot installed, use Ruby's package manager "Gems"
# Just run the follow at your DOS cmd prompt
# gem install hpricot
#
# To run the program from the DOS cmd prompt...
# ruby RubyBot-forRich.rb
#
# RSS feed creation using the RSS library for Ruby
# see: http://www.ruby-doc.org/stdlib/libdoc/rss/rdoc/index.html
# tutorial: http://www.cozmixng.org/~rwiki/?cmd=view;name=RSS+Parser%3A%3ATutorial.en
# "require" is not like "using" in C# nor "#include" in C++.
# require searches the library for the correct ruby file and executes it.
require 'rubygems'
require 'open-uri'
require 'rss/maker'
begin
require 'hpricot'
rescue LoadError # this is like a Try Catch, but there are nifty differences
puts 'Please run "gem install hpricot" before running this program.'
exit
end
BETTER = 4 # a constant
doc = Hpricot(open('http://www.fatwallet.com/c/18/'))
rss = RSS::Maker.make("1.0") { |maker|
#
# Let me explain what is happening...
# The ".make" method created a Maker object, passed it into
# the block (i.e. everything between { and }) as the variable 'maker'.
# Executed the block. Then ".make" returns the RSS object that
# was built while executing the block.
#
maker.channel.about = 'http://www.fatwallet.com/c/18/'
maker.channel.title = 'Fatwallet.com Hot Deals Forum'
maker.channel.description = 'Hot deals rated "Better" or higher.'
maker.channel.link = 'http://www.fatwallet.com/c/18/'
puts 'Searching...' # puts is similar to WriteLine in .Net
(doc/'tr').each { | tr |
#
# (doc/'tr') => performed an XPath search on doc and got a collection of <tr> nodes
# .each => iterates over the collection passing elements one at a time to the "{ |tr| ... }" block
# { | tr | ... } => this is a Closure, the variable 'tr' gets set to the element passed in
#
# similar to C#...
# foreach( Node tr in doc.FindAllNodes( 'tr' ) ) { ... }
# but not really since it is using a closure.
#
# I had to loop on <TR> html tags because I will need to reference this tag later.
#
(tr/'td/img[@title]').each { |img|
if img['title'] =~ /rating: (\d+)/
# $1 is the first group from the match, the rating number
if $1.to_i >= BETTER
(tr/'a[@href]').each { | a |
if a['href'] =~ /^\/t\/18/
puts "http://fatwallet.com#{a['href']} #{a.inner_html}"
item = maker.items.new_item
item.link = "http://fatwallet.com#{a['href']}"
item.title = a.inner_html
end
} # each a
end # if rating >= better
end # if title is rating
} # each img
} # each tr
} # RSS maker
puts "\nRSS..."
puts rss
No comments:
Post a Comment