Friday, September 29, 2006

RSS feeds from Ruby

A friend asked me if I knew a good way to create RSS feeds. He want RSS feeds created for some web sites that didn't already have them. My favorit answer: Ruby.

So I created a demo script in Ruby that parses Fatwallet.com and creates an RSS feed for topics that are rated "Better" or higher. The hardest part of this problem was parsing the HTML, but Ruby made that pretty easy. Creating the RSS information was very simple.

The code below has plenty of comments because my friend doesn't know Ruby. His background is more .Net, Java, and Perl.

#  This is a simple example.  It reads the Hot Topics forum
# on Fatwallet.com finding all topics rated "Better" or higher.
# It will print these topics and their URL as it finds them.
# Finally, it prints the RSS feed for this information.
# See: http://www.fatwallet.com/c/18/
#
# Three steps to run this script...
#
# You can get Ruby from http://www.ruby-lang.org/en/downloads/
# I highly recommend the One-click Installer for Windows
#
# This program uses an extra library, Hpricot, to parse HTML.
# To get Hpricot installed, use Ruby's package manager "Gems"
# Just run the follow at your DOS cmd prompt
# gem install hpricot
#
# To run the program from the DOS cmd prompt...
# ruby RubyBot-forRich.rb
#
# RSS feed creation using the RSS library for Ruby
# see: http://www.ruby-doc.org/stdlib/libdoc/rss/rdoc/index.html
# tutorial: http://www.cozmixng.org/~rwiki/?cmd=view;name=RSS+Parser%3A%3ATutorial.en

# "require" is not like "using" in C# nor "#include" in C++.
# require searches the library for the correct ruby file and executes it.
require 'rubygems'
require 'open-uri'
require 'rss/maker'
begin
require 'hpricot'
rescue LoadError # this is like a Try Catch, but there are nifty differences
puts 'Please run "gem install hpricot" before running this program.'
exit
end

BETTER = 4 # a constant

doc = Hpricot(open('http://www.fatwallet.com/c/18/'))

rss = RSS::Maker.make("1.0") { |maker|
#
# Let me explain what is happening...
# The ".make" method created a Maker object, passed it into
# the block (i.e. everything between { and }) as the variable 'maker'.
# Executed the block. Then ".make" returns the RSS object that
# was built while executing the block.
#
maker.channel.about = 'http://www.fatwallet.com/c/18/'
maker.channel.title = 'Fatwallet.com Hot Deals Forum'
maker.channel.description = 'Hot deals rated "Better" or higher.'
maker.channel.link = 'http://www.fatwallet.com/c/18/'

puts 'Searching...' # puts is similar to WriteLine in .Net

(doc/'tr').each { | tr |
#
# (doc/'tr') => performed an XPath search on doc and got a collection of <tr> nodes
# .each => iterates over the collection passing elements one at a time to the "{ |tr| ... }" block
# { | tr | ... } => this is a Closure, the variable 'tr' gets set to the element passed in
#
# similar to C#...
# foreach( Node tr in doc.FindAllNodes( 'tr' ) ) { ... }
# but not really since it is using a closure.
#
# I had to loop on <TR> html tags because I will need to reference this tag later.
#
(tr/'td/img[@title]').each { |img|
if img['title'] =~ /rating: (\d+)/
# $1 is the first group from the match, the rating number
if $1.to_i >= BETTER
(tr/'a[@href]').each { | a |
if a['href'] =~ /^\/t\/18/
puts "http://fatwallet.com#{a['href']} #{a.inner_html}"

item = maker.items.new_item
item.link = "http://fatwallet.com#{a['href']}"
item.title = a.inner_html

end
} # each a
end # if rating >= better
end # if title is rating
} # each img
} # each tr
} # RSS maker

puts "\nRSS..."
puts rss

Long time no post... why?

Two months and nothing posted. The reason is my wife is pregnant with our third child. Just like the first two, the pregnancy is very difficult. So I haven't done much blogging nor extra coding.

Saturday, July 22, 2006

Let ActiveRecord support Enterprise Databases.

I am currently investigating using ActiveRecord's Migrations to support a legacy enterprise application. Migrations could help us because we currently support multiple versions of the product, which means multiple versions of the schema. Development and QA are already spending too much time reinstalling different versions of the database.

So can ActiveRecord Migrations support my application? Nope; not right out of the box.

Can I tweak ActiveRecord to support my application? YES! It isn't too hard to extend/override ActiveRecord.

Why can't ActiveRecord support my application out of the box? The short answer is ActiveRecord was written to only support a database neutral schema. Supporting multiple databases means only supporting the lowest common denominator. So db specific types in my schema like tinyint and smallint get simplified to integer. ActiveRecord doesn't support all the different constraints that are available: foreign keys, composite keys, triggers. Also, I noticed ActiveRecord did not capture any of my Views (SqlServer).

In addition, I have a deployment problem. I have to ship and deploy my application to customers. Right now, I can't ship and install Ruby on those machines. So I'd rather just have ActiveRecord Migrations produce a SQL file that I can deploy.

After a little experimentation, I discovered it is possible to easily overcome these issues. The first is can I run Migrations without Rails? This is easily accomplished as described on PragDave's blog.

Next, I wanted to know how I could modify ActiveRecord to serve my needs. Thankfully Ruby is such a flexible language that it allows modification of existing classes. So I can override ActiveRecord methods as needed. For example:

require 'rubygems'
require 'active_record'

module ActiveRecord
module ConnectionAdapters # :nodoc:
class SQLServerAdapter
def execute( sql, name = nil )
puts sql
puts "GO"
puts
end
end
end
end



In the example above, I first load ActiveRecord, then modify the SQLServerAdapter's execute method (my database is SqlServer). It turns out that this method is called by Migration to send SQL to the database. So in this example, I am simply sending the SQL to STDOUT. Notice that I can also modify the output. In this example I add the “GO” command between SQL statements. I could have just as easily added transactions or error checking. With a few more tweaks, I could have sent this to a file to generate a SQL script from the Migration.

ActiveRecord contains code to simplify data types for database neutrality. Since the legacy database uses many other types, this simplification is an issue. So after examining the source, I found a couple of places where the simplification is applied. By overriding these methods it is easy to get all ActiveRecord to support the data types required. For example, the code below adds several types.

module ActiveRecord
module ConnectionAdapters # :nodoc:
class SQLServerAdapter
def native_database_types
SQLServerAdapter.native_database_types
end

def self.native_database_types
{
:binary => { :name => "binary", :haslimit => true },
:bit => { :name => "bit"},
:char => { :name => "char", :haslimit => true },
:decimal => { :name => "decimal" },
:int => { :name => "int" },
:nvarchar => { :name => "nvarchar" },
:real => { :name => "real" },
:smallint => { :name => "smallint" },
:tinyint => { :name => "tinyint" },
:timestamp => { :name => "timestamp" },
:uniqueidentifier => { :name => "uniqueidentifier" },
:varchar => { :name => "varchar", :limit => 255, :haslimit => true },
:primary_key => "int NOT NULL IDENTITY(1, 1) PRIMARY KEY",
:text => { :name => "text" },
:float => { :name => "float", :limit => 8 },
:datetime => { :name => "datetime" },
:image => { :name => "image"},
}
end
end

class ColumnWithIdentity
def initialize(name, default, sql_type = nil, is_identity = false, null = true, scale_value = 0)
super(name, default, sql_type, null)
@identity = is_identity
@is_special = sql_type =~ /text|ntext|image/i ? true : false
@scale = scale_value
# SQL Server only supports limits on a few types
@limit = nil unless SQLServerAdapter.native_database_types[@type][:haslimit] == true
end


#DO NOT SIMPLIFY, Just use the native type name; trim off size
def simplified_type(field_type)
field_type.slice( /[^\(]*/).to_sym
end
end
end
end



This example shows how easy it is to change ActiveRecord's behavior as needed.

Now the code above isn't a complete solution, but it will get SchemaDump to output the correct types that my legacy system is using. If you look at the original methods you can see the changes were pretty simple to make. One refactoring I did was to make SQLServerAdapter's instance method native_database_types into a class method. Then I could access it from ColumnWithIdentity, where it was copied code before. Also, I added the :haslimit key to make the code in ColumnWithIdentity more data driven off the native_database_types hash. Before, it had the types with limits hard coded into the ColumnWithIdentity initialize method.

What about adding new features? This is probably the easiest of all. A frequent request is to support composite keys. Just as an example of extending the DSL, the following will create a primary key constraint and support multiple columns in the key.

module ActiveRecord
module ConnectionAdapters # :nodoc:
module SchemaStatements
def add_pk(table_name, column_names)
quoted_column_names = column_names.map { |e| quote_column_name(e) }.join(", ")
sql = "ALTER TABLE #{table_name} ADD CONSTRAINT PK_#{table_name} PRIMARY KEY CLUSTERED ( #{quoted_column_names} ) ON [PRIMARY]"
execute(sql)
end
end
end
end



This is simple DSL programming in Ruby. Another approach is to allow the :primarykey option for the create_table method to take an array of column names. I'll leave that exercise for later.

These examples show what is possible. I know there is a raging debate about supporting db neutrality verses db specifics. DHH has expressed his intention to keep Rails db neutral, because that is best for the world he lives within. For those of us unfortunate enough to live with legacy databases, we need an ActiveRecord that fully supports our database. Why not support both? Why not setup a method for the enterprise folks to add the database specifics to ActiveRecord? Maybe some form of plug-in or stable API that we could extend (my changes above risk being broken by future versions of ActiveRecord). To satisfy the db neutral camp, just generate warnings or errors if db specific extensions are used when ActiveRecord is configured to be db neutral.

Happy Migrating!

Friday, June 30, 2006

Thursday, June 22, 2006

Ruby: Meta Programming and Stack Traces

A couple of times I have run into trouble debugging or tracing a method in Ruby. Usually you can just call Kernal.caller to get a stack trace. But what if the method was generated? You don't get the correct location. For example:

class Kung
mstr = %-
def foo
puts 'Hello World from Kung.foo'
puts caller(0)
end
-
module_eval mstr
end

Kung.new.foo


Which generates the following output:

Hello World from Kung.foo
(eval):4:in `foo'
Kung-foo.rb:11


The stack trace only shows "(Eval):4:in 'foo'" which is almost useless. The "(Eval)" is a clue that the method was dynamically created using meta-programming. In this simple example, it is easy to find the dynamic code since it is near the caller "Kung-foo.rb:11". However in a real project it is frequently located far away, possibly in other source files.

To fix the stack trace, the author should use the optional arguments to method_eval as follows:

class Monkey
line, mstr = __LINE__, %-
def see
puts 'Hello World from Monkey.see'
puts caller(0)
end
-
module_eval mstr, __FILE__, line
end

Monkey.new.see


The output now shows the correct line number and file name:

Hello World from Monkey.see
Monkey-see.rb:5:in `see'
Monkey-see.rb:11


Update after reading the code in ActiveSupport core_ext\attribute_accessors.rb I found a nice way to do the above with fewer lines of code:

class Monkey
module_eval(<<-EOS, __FILE__, __LINE__)
def see
puts 'Hello World from Monkey.see'
puts caller(0)
end
EOS
end

Monkey.new.see

Monday, June 19, 2006

Ruby Class Variables, Attributes and Constants

Ruby Class Variables, Attributes and Constants

(For a related post, see Use Class Instance Variables Not Class Variables)


Writing a little ruby code the other day, I wanted to use a class variables but it didn't behave as expected.


01: class Class_Variable
02: @@var = 1
03: def Class_Variable.report; @@var end
04: end
05:
06: class Child_V < Class_Variable
07: @@var = 2
08: end
09:
10: puts Class_Variable.report #=> 2
11: puts Child_V.report #=> 2
12:


I was surprised by the result. Most other languages would have the child class shadow the parent field, but Ruby shared it! Ruby does provide Class Attributes and Constants as alternatives, but each has it's own symantics.

I threw together the following code to show the differences.


01: class Class_Variable
02: @@var = 1
03: def Class_Variable.report; @@var end
04: end
05:
06: class Child_V < Class_Variable
07: @@var = 2
08: end
09:
10: puts Class_Variable.report #=> 2
11: puts Child_V.report #=> 2
12:
13:
14:
15: class Class_Attribute
16: @var = 1 #class attribute
17:
18: def initialize
19: @var = 2 #instance attribute
20: end
21:
22: def report
23: @var # instance attribute, not the class attribute
24: end
25:
26: def Class_Attribute.report
27: @var # class attribute
28: end
29: end
30:
31: class Child_A < Class_Attribute
32: @var = 3
33: end
34:
35: puts Class_Attribute.report #=> 1
36: puts Class_Attribute.new.report #=> 2
37: puts Child_A.report #=> 3
38: puts Child_A.new.report #=> 2
39:
40:
41:
42: class Class_Constant
43: VAR = [ 'a' ]
44: VAR2 = [ 'b' ]
45:
46: def self.report
47: VAR2[0]
48: end
49: end
50:
51: class Child_C < Class_Constant
52: VAR2 = [ 'c' ]
53: end
54:
55: puts Class_Constant::VAR[0] #=> 'a'
56: puts Class_Constant::VAR2[0] #=> 'b'
57: puts Class_Constant.report #=> 'b'
58: #puts Child_C::VAR[0] #=> uninitialized constant error
59: puts Child_C::VAR2[0] #=> 'c'
60: puts Child_C.report #=> 'b'
61:


First notice that Class Variables are shared with their subclass (lines 1-11) . This differs greatly from Java and C# which shadow inherited variables.

The Class Attributes and Constants are "class private". That is, the child has no access to the parent attribute/constant with the same name. Class methods however are inherited. So calling "report" for Child_A and Child_C displays a significant difference between using Class Attributes and Constants (lines 37 and 60).

Unfortunately none of these alternatives matches the behavior seen in other languages. This can cause some confusion when going from Java or C# to Ruby. Class Attributes using accessor methods is the closest match to other languages. However the syntax similarities between class attributes and instance attributes can cause problems (lines 16, 19, 23, 27).