Written by Jiří Pospíšil, a full stack developer currently working primarily with Ruby on Rails and Ember.js.

You can get in touch with me at mekishizufu@gmail.com


The definitive guide to Arel, the SQL manager for Ruby

Published on Jun 16, 2014

Arel is the kind of library that many of us Rails developers use on a daily basis and might not even know about it. So what’s this library whose name only pops up when everything else fails all about?

It’s all about providing frameworks with a way of building and representing SQL queries. It’s not the kind of library you would typically want to use directly (although you could as shown in a minute). Arel is meant to be the basic building block upon which frameworks build their own APIs that are more suitable for the end user.

One of those frameworks is ActiveRecord (AR), the default ORM in Rails. ActiveRecord’s responsibility is to provide a connection to the database, a convenient way to specify relationships between your models, provide a nice query interface and all the other things we enjoy.

# ActiveRecord
User.first.comments.where(created_at: 2.days.ago..Time.current).limit(5)

Behind the scenes, ActiveRecord uses Arel to build the queries and ultimately calls out to it to get the final SQL before shipping it to the database of your choice.

So how exactly Arel achieves building the queries in such a flexible way? By building an AST. Arel internally operates on AST nodes - you modify the query via a method call, Arel modifies or creates the appropriate node in the tree.

This kind of representation holds two important properties. First, composability. By being composable you gain the power to build the query iteratively, piece by piece, and even combine several queries together. Many parts of the API (and consequently AR’s API) would be impossible or at least very difficult handle without this property.

# ActiveRecord

bob = User.where(email: "bob@test.com").where(active: true)
# => SELECT "users".* FROM "users" WHERE "users"."email" = 'bob@test.com' AND "users"."active" = 't'

details = User.select(:id, :email, :first_name).order(id: :desc)
# => SELECT "users"."id", "users"."email", "users"."first_name" FROM "users" ORDER BY "users"."id" DESC

bob.merge(details).first
# => SELECT "users"."id", "users"."email", "users"."first_name" FROM "users"
#    WHERE "users"."email" = 'bob@test.com' AND "users"."active" = 't'
#    ORDER BY "users"."id" DESC LIMIT 1

While a contrived example, it is sufficient to show that it’d be very difficult to work with these queries without some sort of abstract representation.

The other equally important property is the completely obliviousness to the outside world. Arel doesn’t care what’s going to happen with the result. It might end up converted into a SQL query or into an entirely different format. In fact, Arel is able to convert the query into the Graphviz’s dot format and you can create pretty diagrams out of it (more on that later).

So far we’ve seen only ActiveRecord’s query interface, the part built on top of Arel. Let’s get below the surface and start working with Arel directly. To play along, use the following instructions. The script will download the correct version of libraries and leave you inside a Pry REPL instance (run bundle console if you’ve left the REPL and want to come back). It’s always a good idea to inspect all 3rd party scripts before you run them.

cd /tmp
mkdir arel_playground && cd arel_playground

wget http://jpospisil.com/arel_setup.sh
# or
curl -L -o arel_setup.sh http://jpospisil.com/arel_setup.sh

bash ./arel_setup.sh

To stay current for the foreseeable future, the text is based on soon-to-be released version of Arel. The text also contains a lot of links to the actual Arel’s source code, you are strongly encouraged to look around the file beyond the highlighted area to see all of the options!

Diving in with SelectManager

Let’s start by building a select query that will give us all users. First, we need to create an object representing the table itself. Notice that you can name the table whatever you want, it doesn’t have to exist anywhere.

users = Arel::Table.new(:users)

Arel::Table itself doesn’t do much but it has a lot of handy methods which are responsible for delegating the calls deeper into the system. The method we are interested in now is the project method. The name comes from relational algebra but rest assured, it’s just a plain select.

select_manager = users.project(Arel.star)

Notice the use of Arel.star, a convenience method for the * character. What we got back is an instance of Arel::SelectManager, the object responsible for assembling of the select query. Now we should be able to get the resulting SQL from select_manager.

select_manager.to_sql
# => NoMethodError: undefined method `connection' for nil:NilClass

And it didn’t work. If you think about it, the failure kind of makes sense (although the error should be handled more gracefully) because we didn’t specified any database details and Arel has no way of knowing for what database we want the query generated. Databases may differ in syntax, capabilities and even in character escaping. Let’s get ourselves an ActiveRecord database connection and try again.

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")

users          = Arel::Table.new(:users, ActiveRecord::Base)
select_manager = users.project(Arel.star)

select_manager.to_sql
# => SELECT * FROM "users"

Notice we passed ActiveRecord::Base to the Arel::Table’s constructor. We could have also set it globally via Arel::Table.engine=. With all that in place, we finally have our precious SQL query.

The interesting thing is the collaboration between Arel and ActiveRecord. Arel is technically independent from ActiveRecord but it needs to get the database details from somewhere and currently it uses ActiveRecord. More specifically, Arel requires ActiveRecord’s APIs. There’s even a fake ActiveRecord implementation, FakeRecord, that is used to run the Arel’s tests. In the past you needed a running MySQL server.

Getting picky

Querying for all users' details is nice but let’s be more specific. Say we want to select only the users' ids and names. The key abstraction Arel provides for working with attributes (column names) is Arel::Attribute.

Arel::Attribute represents a single column of an arbitrary name. The easiest way to get a hold of an Arel::Attribute for a table is to use the Arel::Table#[] method. We can use the result right away in the project method.

select_manager = users.project(users[:id], users[:name])
select_manager.to_sql
# => SELECT "users"."id", "users"."name" FROM "users

As you’ve probably noticed, the class gets included with a bunch of modules which add a lot of functionality. The first module, Arel::Expressions, adds the common aggregate functions.

select_manager = users.project(users[:comments_count].average)
select_manager.to_sql
# => SELECT AVG("users"."comments_count") AS avg_id FROM "users"

The results of these aggregate functions are kept in variables with hardcoded names (avg_id in this case). Fortunately, Arel::AliasPredication comes to our rescue.

select_manager = users.project(users[:vip].as("status"), users[:vip].count.as("count")).group("vip")
select_manager.to_sql
# => SELECT "users"."vip" AS status, COUNT("users"."vip") AS count FROM "users"  GROUP BY vip

The Arel::Math module is pretty neat. It implements the common math operators so that we can use them directly on the attributes as if we’re working with the values.

select_manager = users.project((users[:stared_comments_count] / users[:comments_count]).as("ratio"))
select_manager.to_sql
# => SELECT "users"."stared_comments_count" / "users"."comments_count" AS ratio FROM "users"

Extending our index finger

Select queries which return data from the whole table are quite rare, usually you want to have more fine grained control. Let’s see how Arel handles these cases. The starting point is again Arel::Attribute. More specifically, it’s one of its included modules, Arel::Predications. By looking at the code you can see a lot of handy methods, many of which do not have their equivalent in ActiveRecord' APIs.

select_manager = users.project(Arel.star).where(users[:id].eq(23).or(users[:id].eq(42)))
select_manager = users.project(Arel.star).where(users[:id].eq_any([23, 42]))
select_manager.to_sql
# => SELECT * FROM "users"  WHERE ("users"."id" = 23 OR "users"."id" = 42)

For more complicated queries, it’s usually best to the build the parts individually and combine them together at the end.

admins_vips    = users[:admin].eq(true).or(users[:vip].eq(true))
with_karma     = users[:karma].gteq(5000).and(users[:hellbanned].eq(false))

select_manager = users.project(users[:id]).where(admins_vips.or(with_karma)).order(users[:id].desc)
select_manager.to_sql
# => SELECT COUNT("users"."id") FROM "users" WHERE (("users"."admin" = 't' OR "users"."vip" = 't') 
#      OR "users"."karma" >= 5000 AND "users"."hellbanned" = 'f')
#    ORDER BY "users"."id" DESC

The more the merrier

Next, let’s take a look at join statements. In line with the previously shown API, Arel exposes joins directly from Arel::SelectManager. As expected, Arel supports the usual INNER JOIN, and LEFT, RIGHT, FULL OUTER JOIN kinds.

comments       = Arel::Table.new(:comments, ActiveRecord::Base)

select_manager = users.project(Arel.star).join(comments).on(users[:id].eq(comments[:user_id]))
select_manager.to_sql
# => SELECT * FROM "users" INNER JOIN "comments" ON "users"."id" = "comments"."user_id"

To create the remaining kinds of joins, we need to explicitly pass a second argument to the join method.

select_manager = users.project(Arel.star).join(comments, Arel::Nodes::OuterJoin).
  on(users[:id].eq(comments[:user_id])).
  having(comments[:id].count.lteq(16)).
  take(15)

select_manager.to_sql
# => SELECT * FROM "users" LEFT OUTER JOIN "comments" ON "users"."id" = "comments"."user_id" 
#    HAVING COUNT("comments"."id") <= 16 LIMIT 15

Since the need for OuterJoin is very common, there’s a shortcut called outer_join, which internally calls the join method with the Arel::Nodes::OuterJoin argument for us. To get the remaining kinds of joins, there are Arel::Nodes::FullOuterJoin and Arel::Nodes::RightOuterJoin nodes available.

The rarely used CROSS JOIN kind is not directly supported. What’s also not supported out of the box is the USING clause but as with the previous case, we can get around that by resorting to creating Arel::Nodes::SqlLiteral manually or better yet by rewriting the query to use the supported constructs.

There’s always more

Arel comes with support even for slightly advanced features such as WITH statements or WINDOW functions. Let’s try to replicate an example 7.8.1. SELECT in WITH from the PostgreSQL manual. The query is quite complicated, it consists of 2 WITH statements and a few subqueries. Let’s focus first on the WITH clauses regional_sales and top_regions.

orders          = Arel::Table.new(:orders, ActiveRecord::Base)
reg_sales       = Arel::Table.new(:regional_sales, ActiveRecord::Base)
top_regions     = Arel::Table.new(:top_regions, ActiveRecord::Base)

reg_sales_query = orders.project(orders[:region], orders[:amount].sum.as("total_sales")).
                    group(orders[:region])
reg_sales_as    = Arel::Nodes::As.new(reg_sales, reg_sales_query)

Nothing we haven’t seen before. The only exception is the explicit instantiation of Arel::Nodes::As. There doesn’t seem to be a way around it as you cannot create an alias via the usual as method.

top_regions_subquery = reg_sales.project(Arel.sql("SUM(total_sales) / 10"))
top_regions_query    = reg_sales.project(reg_sales[:region]).
                        where(reg_sales[:total_sales].gt(top_regions_subquery))
top_regions_as       = Arel::Nodes::As.new(top_regions, top_regions_query)

The use of Arel.sql is not ideal, however, as with the previous part, there is not a way to use math operations on the result of the sum call.

attributes = [orders[:region], orders[:product], orders[:quantity].as("product_units"),
               orders[:amount].as("product_sales")]

res = orders.project(*attributes).where(orders[:region].in(top_regions.project(top_regions[:region]))).
        with([reg_sales_as, top_regions_as]).group(orders[:region], orders[:product])

res.to_sql

With all of that in place, we have our final query. If we look at the parts individually, they are pretty simple. Overall, however, the code is longer than a pure SQL solution. The fact doesn’t matter when using Arel pragmatically but if composed by hand, one has to always consider whether it’s actually worth the effort.

SelectManager is not the only one

So far all we’ve been doing was writing select queries via SelectManager, but Arel of course supports the other operations as well. Let’s quickly take a look at deleting. There are two ways you can create a delete query. The first way is to explicitly instantiate Arel::DeleteManager.

delete_manager = Arel::DeleteManager.new(ActiveRecord::Base)
delete_manager.from(users).where(users[:id].eq_any([4, 8]))
delete_manager.to_sql
# => DELETE FROM "users" WHERE ("users"."id" = 4 OR "users"."id" = 8)

The other way, although it seems deprecated, is to create the delete statement from a select statement by calling compile_delete (there are similar methods for the other operations as well). By looking at the code we can see that all it does is pick values out of the object it’s mixed into (Arel::SelectManager) and passing it to a new instance of Arel::DeleteManager.

select_manager = users.project(users[:id], users[:name]).where(users[:banned].eq(true))
select_manager.to_sql
# => SELECT "users"."id", "users"."name" FROM "users"  WHERE "users"."banned = 't'

delete_manager = select_manager.compile_delete
delete_manager.to_sql
# => DELETE FROM "users" WHERE "users"."banned" = 't'

The managers for the remaining operations, InsertManager and UpdateManager, work in a similar fashion.

insert_manager = Arel::InsertManager.new(ActiveRecord::Base)
insert_manager.insert([[users[:name], "Bob"], [users[:admin], true]])
insert_manager.to_sql
# => INSERT INTO "users" ("name", "admin) VALUES ('Bob', 't')

Notice that Arel::InsertManager is able to figure out the name of the table we’re inserting to automatically through the use of Arel::Attribute. If we’re to use string literals instead, we’d have to specify the table name via the into method. The same is not offered in Arel::UpdateManager and we have to use table.

update_manager = Arel::UpdateManager.new(ActiveRecord::Base)
update_manager.table(users).where(users[:id].eq(42))
update_manager.set([[users[:name], "Bob"], [users[:admin], true]])
update_manager.to_sql
# => UPDATE "users" SET "name" = 'Bob', "admin" = 't' WHERE "users"."id" = 42

The story of .to_sql

Throughout the article we’ve been calling .to_sql in almost every example and never actually talked about how it works. As mentioned in the beginning, Arel internally represents all queries as nodes in an abstract syntax tree. The managers create and modify these trees. Naturally, something later has to take the resulting tree and process it to the final output. Arel uses various kinds of visitors to accomplish this (see the Visitor pattern).

In essence, the visitor pattern abstracts away how the nodes of an AST are processed from the nodes themselves. The nodes stay the same, yet it’s possible to apply different visitors and get different results. This is exactly what Arel needs to generate all those kinds of output formats.

The Arel’s implementation of the visitor pattern is interesting. It uses a variation called Extrinsic Visitor. The variation takes great advantage of Ruby’s dynamic behavior and the information available at runtime. Instead of forcing the nodes to implement the accept method, the visitor calls accept on itself with the root node as argument. It then inspects the node to find out its type and looks the appropriate visit method. To make the dispatching part faster, the code uses a simple hash table for caching purposes.

{ 
  Arel::Visitors::SQLite => {
    Arel::Nodes::SelectStatement => "visit_Arel_Nodes_SelectStatement",
    Arel::Nodes::SqlLiteral      => "visit_Arel_Nodes_SqlLiteral",
    Arel::Nodes::Or              => "visit_Arel_Nodes_Or",
    Arel::Attributes::Attribute  => "visit_Arel_Attributes_Attribute",
    Arel::Nodes::InnerJoin       => "visit_Arel_Nodes_InnerJoin",
    Arel::Nodes::Having          => "visit_Arel_Nodes_Having",
    Arel::Nodes::Limit           => "visit_Arel_Nodes_Limit"
    Fixnum                       => "visit_Fixnum",
  }
}

If we look into the visitors directory, we can see a few visitors that Arel comes with by default. Some of them directly correspond to a particular database, some are used only internally and some are used only from AR. Notice that all database related visitors inherit from the to_sql visitor, which is doing most of the work, and that the particular database visitor handles only the differences specific to the concrete database. Let’s create a select manager and get the SQL query out of it without the to_sql method.

select_manager = users.project(Arel.star)
select_manager.to_sql
# => SELECT * FROM "users"

sqlite_visitor = Arel::Visitors::SQLite.new(ActiveRecord::Base.connection)
collector      = Arel::Collectors::SQLString.new
collector      = sqlite_visitor.accept(select_manager.ast, collector)
collector.value
# => SELECT * FROM "users"

A collector is an object that gathers the results as they come in from the visitor. In this particular example, collector could have been a Ruby’s own String and we’d get the same result (without calling the final value of course). If we look at the actual source code of to_sql, we can see that it does the same except it gets the visitor directly from the connection.

Let’s take a look at one more visitor, Arel::Visitors::Dot. The visitor generates the Graphviz’s Dot format and we can use it to create diagrams out of an AST. To make things easier, there’s a convenient to_dot method we can use. We take the output and save it to a file.

File.write("arel.dot", select_manager.to_dot)

On the command line, we use the dot utility to convert the result to an image.

dot arel.dot -T png -o arel.png

Back to upper levels

We have all this power at our disposal at the Arel level but how can we leverage it with ActiveRecord? Turns out that we can very easily get the underlying Arel::Table directly from our models with <Table>.arel_table. What’s even better is that we can get the AST from our ActiveRecord’s queries and manipulate it. A word of warning though, working with the underlying Arel object is not officially supported and things may change between releases without notice.

First, we need a few throw-away tables and the corresponding ActiveRecord objects to work against. Let’s go again with users and comments.

class User < ActiveRecord::Base
  connection.create_table table_name, force: true do |t|
    t.string :name, null: false
    t.integer :karma, null: false, default: 0
    t.boolean :vip, null: false, default: false
    t.timestamps
  end

  create! [{name: "Alice", karma: 999, vip: true}, {name: "Bob", karma: 1000}, {name: "Charlie"}]

  has_many :comments, dependent: :delete_all
end

class Comment < ActiveRecord::Base
  connection.create_table table_name, force: true do |t|
    t.text :text, null: false
    t.integer :points, null: false, default: 0
    t.references :user
    t.timestamps
  end

  belongs_to :user
end

As mentioned in the previous paragraph, we can get the Arel::Table object out by calling arel_table on the model. Once we do that, we can use the same methods as we’ve been using so far throughout the text.

u = User.arel_table
User.where(u[:karma].gteq(1000).or(u[:vip].eq(true))).to_a
# => [#<User id: 1, name: "Alice"...>, #<User id: 2, name: "Bob"...>]

Here we’re passing an Arel node (Arel::Nodes::Grouping) directly to AR’s where. No need to convert anything as AR knows how to deal with these objects. Let’s switch the sides and use an AR query inside an Arel one.

User.first.comments.create! text: "Sample text!", points: 1001

c             = Comment.arel_table
popular_users = User.select(:id).order(karma: :desc).limit(5)
comments      = c.project(Arel.star).where(c[:points].gt(1000).and(c[:user_id].in(popular_users.ast)))

Comment.find_by_sql(comments.to_sql)

To execute Arel queries, we first need to get the SQL out of Arel and then feed it into find_by_sql. Notice that we called ast on popular_users before passing it to Arel’s in. That’s because popular_users is an instance of ActiveRecord::Relation and we need to get the underlying Arel AST.

There of course comes a time when you need to issue a query that doesn’t necessarily result in records coming back. In that case, we can use the connection directly and call execute with the SQL as the argument.

ActiveRecord::Base.connection.execute(c.where(c[:id].eq(1)).compile_delete.to_sql)

One issue you may run into when using ActiveRecord 4.1.x is that calling to_sql might return an SQL query with bind parameters instead of the actual values. The issue has been solved on the current master branch and will be part of the next release. To get around that issue now however, we must use unprepared_statement.

# ActiveRecord 4.1.x

sql = User.first.comments.to_sql
# => SELECT "comments".* FROM "comments"  WHERE "comments"."user_id" = ?

sql = User.connection.unprepared_statement { 
  User.first.comments.to_sql 
}
# => SELECT "comments".* FROM "comments"  WHERE "comments"."user_id" = 1

The code in the unprepared_statement block gets evaluated with a visitor that mixes in Arel::Visitors::BindVisitor, which immediately resolves the bind parameters.

Real world

Having covered all of that, how do we use this in a real word application so that the code is maintainable and won’t become a mess? One way of doing it is to create a class that will represent our query. Let’s take a look at a simple example.

class PrivilegedUsersQuery
  attr_reader :relation

  def initialize(relation = User.all)
    @relation = relation
  end

  def find_each(&block)
    relation.where(privileged_users).find_each(&block)
  end

  private

  def privileged_users
    with_high_karma.or with_vip
  end

  def with_high_karma
    table[:karma].gt(1000)
  end

  def with_vip
    table[:vip].eq(true)
  end

  def table
    User.arel_table
  end
end

We take full advantage of the fact that we can build queries iteratively and dedicate a method to each part or similar, whatever feels like the best approach for the particular situation.

PrivilegedUsersQuery.new.find_each do |user| 
 # ...
end

The end

Arel is a great tool to build abstractions upon and a powerful helper when the abstractions fail to provide the functionality you need. By now you know everything there’s to know to use Arel effectively and most importantly you know where to look for answers when constructing a complicated query or when things go wrong. Please let me know if you found an error of any kind or have other suggestions.