Adding Formula Fields to a Database with Ruby on Rails

Every now and then you might encounter a situation where you need to have a database column that has a value based on a formula consisting of values of other columns. Unfortunately, there is no standard way of embedding this into the SQL definition of a table. You must keep track of updating this value on your own when using the database.

But, this begs the question, why would I want to have a formula field? Can't I just do all my calculations when I am fetching my data with a SQL query? Sure, you can. However, if you are doing calculations on vast amounts of data, having part of this data pre-calculated in a formula field can give you a solid performance boost.

If you're using Ruby on Rails for your project, there's a very simple solution you can implement. I'll show you how through a trivial example. In this example, we imagine a teacher wanting to store his students' assignment marks.

Let's go ahead and create our rails project called teacherexample. I'll use MySQL as my database in this example, but the concept applies to any database system.


rails --database=mysql teacherexample
OUTPUT:
create
create app/controllers
.
.
create log/test.log

Now, we need to create our database table. We'll create a model to store the marks of the hypothetical assignments. We can do this using the following command:


cd teacherexample
ruby script/generate model mark student_id:integer assignment1:integer assignment2:integer
OUTPUT:
exists app/models/
exists test/unit/
.
.
create db/migrate/20090608005136_create_marks.rb

The command above created a model with all the fields that we need. So let's look at the migration file. This file is located at teacherexample/db/migrate and ends with _create_marks.rb. As things stand, the file has the following contents:

class CreateMarks < ActiveRecord::Migration
  def self.up
    create_table :marks do |t|
      t.integer :assignment1
      t.integer :assignment2

      t.timestamps    # remove this line
    end
  end

  def self.down
    drop_table :marks
  end
end

Remove the line noted above. This line will create two timestamp fields in the database that we won't need in this example. At this point, we're ready to migrate this structure to our database, and we can do this with the rake migrate command:

rake db:migrate

Now, I'll open the model Mark (found in teacherexample/app/models/mark.rb) and write a function that will populate the 'marks' table with 500,000 records. This is the model Mark after writing that function:

class Mark < ActiveRecord::Base
  def self.populate_database
    Benchmark.realtime do
      1.upto(500000) do |num|
        mark = Mark.new
        mark.assignment1 = rand(100).to_i   # random numbers to store as
        mark.assignment2 = rand(100).to_i   # assignment marks
        mark.save
      end
    end
  end
end

We are now ready to populate the database. I'll fire the ruby console and run the function above:

ruby script/console
Loading development environment (Rails 2.3.2)
>> Mark.populate_database
=> 784.340764045715
>>

Executing the function to insert 500,000 records took approximately 784 seconds (around 13 minutes).

Now, I will open MySQL and create a query to retrieve the number of students who passed (i.e. those students who have an average greater than or equal to 50%).

mysql> SELECT COUNT(*) FROM marks WHERE ((assignment1 + assignment2)/2) >= 50;
+----------+
| count(*) |
+----------+
|   247241 |
+----------+
1 row in set (0.48 sec)

As we can conclude, when we did not use a formula field it took 0.48 seconds to calculate the number of students who got an average of 50 or more. Now it's time to try this using a formula field.

Edit the database migration file as follows:

class CreateMarks < ActiveRecord::Migration
  def self.up
    create_table :marks do |t|
      t.integer :assignment1
      t.integer :assignment2
      t.float :average     #Add this line
    end
  end

  def self.down
    drop_table :marks
  end
end

The line we added above will create an extra field for us to calculate the average. Let's execute this change and empty the database using the following rake command at the directory of the rails project:

rake db:migrate:reset

We need to tell rails to update the formula field every time a record is created and/or changed. This can be done by overriding the save() method. Make the following change to the marks.rb file:

class Mark < ActiveRecord::Base
  def self.populate_database
    Benchmark.realtime do
      1.upto(500000) do |num|
        mark = Mark.new
        mark.assignment1 = rand(100).to_i
        mark.assignment2 = rand(100).to_i
        mark.save
      end
    end
  end
  
  # Overriding the save function so as to update the average every time the object gets saved
  def save
    self.average = (assignment1 + assignment2) / 2.0;
    super   # calls the rails save function to store our object to the database
  end
end

Populating the database with our new changes, we get the following:

ruby script/console
Loading development environment (Rails 2.3.2)
>> Mark.populate_database
=> 788.891064167023
>>

Populating the database took approximately 789 seconds, only 5 seconds more than inserting the records without the formula field. As I mentioned earlier, we need to know the number of students who got an average greater than or equal to 50. Since the average is already calculated, we can simply check if a student has passed by checking the average column as follows:

mysql> select count(*) from marks where average >= 50;
+----------+
| count(*) |
+----------+
|   247919 | 
+----------+
1 row in set (0.25 sec)

As you can see from the results above, calculating the average when we inserted the data improved our query's efficiency. Instead of taking 0.48 seconds it now takes 0.25 seconds, almost twice as efficient!

Of course, nothing comes for free, this efficiency is on behalf of a little extra overhead when inserting queries and the disk space required to store the pre-calculated data. Thus, adding a formula field really depends on your application needs and on how often you query your data.

6 thoughts on “Adding Formula Fields to a Database with Ruby on Rails”

  1. GarykPatton :

    How soon will you update your blog? I’m interested in reading some more information on this issue.

    I should be publishing a post this weekend that also deals with playing with databases in ruby on rails. But, it will not be covering the issue of formula fields. Is there something in particular you were looking for that I have possibly missed?

  2. Assalamo Alaykom,

    Jazak Allah Khayr for your nice article which what I was looking for.

    I just have a small note:

    You mentioned that running this command: ruby script/generate model mark student_id:integer assignment1:integer assignment2:integer

    Produced the following model (I will include parts of it):

    def self.up
    create_table :marks do |t|
    t.integer :assignment1
    t.integer :assignment2

    t.timestamps # remove this line
    end

    But, where is: t.integer :student_id?

    Provided that I’m using Rails 3.0.0, this is what I get for this part of the model:

    create_table :marks do |t|
    t.integer :student_id
    t.integer :assignment1
    t.integer :assignment2

    t.timestamps
    end

    Have you removed: “t.integer :student_id” from your example?

    Thanks.

  3. We alaikum el-salam Abder-Rahman,

    This article was written before Rails 3.0.0 came out. This was done using Rails 2.x.x if I remember correctly, so you might see some minor discrepancies. In that version of Rails, “t.integer :student_id” was not part of the model.

Leave a Reply

Your email address will not be published. Required fields are marked *