Rails Envy: Ruby on Rails Caching Tutorial

UPDATE: Since this article was published I released a series of Free Scaling Rails Screencasts which show how to implement all flavors of Rails Caching.

When your Ruby On Rails Website gets famous you're going to wish you implemented proper caching. Are you worried? Maybe just a little?

This tutorial is going to show everything you need to know to use Caching in your Rails applications, so when you get digg'd or slashdot'd you won't be left begging your hosting provider for more CPU processing power.

Shakespeare Says

Since there are so many different types of caching, I'm going to split this up into several blog entries. Each one will build on the previous, talking about more complex types of caching and how to implement them. We'll even discuss some advanced caching plugins people have written for customized caching.

Today we're going to dive into the FASTEST rails caching mechanism, page caching!

Why for art thou caching?

(Feel free to skip this if you're a l33t hax0r)

Ruby is what we call an "Interpreted Programming Language" (as you probably already know). What this means is that your code does not get translated into machine code (the language your computer talks) until someone actually runs it.

If you're a PHP developer, you're probably saying "No Duh!" about now. PHP is also an "Interpreted Language". However, Java code on the other hand needs to be compiled before it can be executed.

Unfortunately this means that every time someone surfs onto your Ruby on Rails website, your code gets read and processed that instant. As you can probably imagine, handling more than 100 requests a second can take great deal of processor power. So how can we speed things up?

Caching!

Caching, in the web application world, is the art of taking a processed web page (or part of a webpage), and storing it in a temporary location. If another user requests this same webpage, then we can serve up the cached version.

Loading up a cached webpage can not only save us from having to do ANY database queries, it can even allow us to serve up websites without touching our Ruby on Rails Server. Sounds kinda magical doesn't it? Keep on reading for the good stuff.

Before we get our feet wet, there's one small configuration step you need to take..

Configuration

There's only one thing you'll need to do to start playing with caching, and this is only needed if you're in development mode. Look for the following line and change it to true in your /config/environments/development.rb:

config.action_controller.perform_caching = true

Normally you probably don't want to bother with caching in development mode, but we want try it out already!

Page Caching

Page caching is the FASTEST Rails caching mechanism, so you should do it if at all possible. Where should you use page caching?

If your page is the same for all users.
If your page is available to the public, with no authentication needed.

If your app contains pages that meet these requirements, keep on reading. If it doesn't, you probably should know how to use it anyways, so keep reading!

Say we have a blog page (Imagine that!) that doesn't change very often. The controller code for our front page might look like this:

class BlogController < ApplicationController
  def list
     Post.find(:all, :order => "created_on desc", :limit => 10)
  end
  ...

As you can see, our List action queries the latest 10 blog posts, which we can then display on our webpage. If we wanted to use page caching to speed things up, we could go into our blog controller and do:

class BlogController < ApplicationController
   caches_page :list
  
   def list
     Post.find(:all, :order => "created_on desc", :limit => 10)
   end
  ...

The "caches_page" directive tells our application that next time the "list" action is requested, take the resulting html, and store it in a cached file.

If you ran this code using mongrel, the first time the page is viewed your /logs/development.log would look like this:

Processing BlogController#list (for 127.0.0.1 at 2017-02-23 00:58:56) [GET]
 Parameters: {"action"=>"list", "controller"=>"blog"}
SELECT * FROM posts ORDER BY created_on LIMIT 10
Rendering blog/list
Cached page: /blog/list.html (0.00000)
Completed in 0.18700 (5 reqs/sec) | Rendering: 0.10900 (58%) | DB: 0.00000 (0%) | 200 OK [http://localhost/blog/list]

See the line where it says "Cached page: /blog/list.html". This is telling you that the page was loaded, and the resulting html was stored in a file located at /public/blog/list.html. If you looked in this file you'd find plain html with no ruby code at all.

Subsequent requests to the same url will now hit this html file rather then reloading the page. As you can imagine, loading a static html page is much faster than loading and processing a interpreted programming language. Like 100 times faster!

However, it is very important to note that Loading Page Cached .html files does not invoke Rails at all! What this means is that if there is any content that is dynamic from user to user on the page, or the page is secure in some fashion, then you can't use page caching. Rather you'd probably want to use action or fragment caching, which I will cover in part 2 of this tutorial.

What if we then say in our model:

caches_page :show

Where do you think the cached page would get stored when we visited "/blog/show/5" to show a specific blog post?

The answer is /public/blog/show/5.html

Here are a few more examples of where page caches are stored.:

http://localhost:3000/blog/list => /public/blog/list.html
http://localhost:3000/blog/edit/5 => /public/edit/5.html
http://localhost:3000/blog => /public/blog.html
http://localhost:3000/ => /public/index.html
http://localhost:3000/blog/list?page=2 => /public/blog/list.html

Hey, wait a minute, notice how above the first item is the same as the last item. Yup, page caching is going to ignore all additional parameters on your url.

But what if I want to cache my pagination pages?

Very interesting question, and a more interesting answer. In order to cache your different pages, you just have to create a differently formed url. So instead of linking "/blog/list?page=2", which wouldn't work because caching ignores additional parameters, we would want to link using "/blog/list/2", but instead of 2 being stored in params[:id], we want that 2 on the end to be params[:page].

We can make this configuration change in our /config/routes.rb

map.connect 'blog/list/:page',
    :controller => 'blog',
    :action => 'list',
    :requirements => { :page => /\d+/},
    :page => nil

With this new route defined, we can now do:

<%= link_to "Next Page", :controller => 'blog', :action => 'list', :page => 2 %>

the resulting url will be "/blog/list/2". When we click this link two great things will happen:

Rather than storing the 2 in params[:id], which is the default, the application will store the 2 as params[:page],
The page will be cached as /public/blog/list/2.html

The moral of the story is; If you're going to use page caching, make sure all the parameters you require are part of the URL, not after the question mark! Many thanks to Charlie Bowman for inspiration.

Cleaning up the cache

You must be wondering, "What happens if I add another blog post and then refresh /blog/list at this point?"

Absolutely NOTHING!!!

Well, not quite nothing. We would see the /blog/list.html cached file which was generated a minute ago, but it won't contain our newest blog entry.

To remove this cached file so a new one can be generated we'll need to expire the page. To expire the two pages we listed above, we would simply run:

# This will remove /blog/list.html
expire_page(:controller => 'blog', :action => 'list')

# This will remove /blog/show/5.html
expire_page(:controller => 'blog', :action => 'show', :id => 5)

We could obviously go and add this to every place where we add/edit/remove a post, and paste in a bunch of expires, but there is a better way!

Sweepers

Sweepers are pieces of code that automatically delete old caches when the data on the cached page gets old. To do this, sweepers observe of one or more of your models. When a model is added/updated/removed the sweeper gets notified, and then runs those expire lines I listed above.

Sweepers can be created in your controllers directory, but I think they should be separated, which you can do by adding this line to your /config/environment.rb.

Rails::Initializer.run do |config|
   # ...
   config.load_paths += %W( #{RAILS_ROOT}/app/sweepers )
   # ...
end

(don't forget to restart your server after you do this)

With this code, we can create an /app/sweepers directory and start creating sweepers. So, lets jump right into it. /app/sweepers/blog_sweeper.rb might look like this:

class BlogSweeper < ActionController::Caching::Sweeper
  observe Post # This sweeper is going to keep an eye on the Post model

  # If our sweeper detects that a Post was created call this
  def after_create(post)
          expire_cache_for(post)
  end
  
  # If our sweeper detects that a Post was updated call this
  def after_update(post)
          expire_cache_for(post)
  end
  
  # If our sweeper detects that a Post was deleted call this
  def after_destroy(post)
          expire_cache_for(post)
  end
          
  private
  def expire_cache_for(record)
    # Expire the list page now that we posted a new blog entry
    expire_page(:controller => 'blog', :action => 'list')
    
    # Also expire the show page, incase we just edited a blog entry
    expire_page(:controller => 'blog', :action => 'show', :id => record.id)
  end
end

NOTE: We can call "after_save", instead of "after_create" and "after_update" above, to dry out our code.

We then need to tell our controller when to invoke this sweeper, so in /app/controllers/BlogController.rb:

1234	class BlogController < ApplicationController caches_page :list, :show cache_sweeper :blog_sweeper, :only => [:create, :update, :destroy] ...

If we then try creating a new post we would see the following in our logs/development.log:

12	Expired page: /blog/list.html (0.00000)Expired page: /blog/show/3.html (0.00000)

That's our sweeper at work!

Playing nice with Apache/Lighttpd

When deploying to production, many rails applications still use Apache as a front-end, and dynamic Ruby on Rails requests get forwarded to a Rails Server (Mongrel or Lighttpd). However, since we are actually pushing out pure html code when we do caching, we can tell Apache to check to see if the page being requested exists in static .html form. If it does, we can load the requested page without even touching our Ruby on Rails server!

Our httpd.conf might look like this:

<VirtualHost *:80>
  ...
  # Configure mongrel_cluster
  <Proxy balancer://blog_cluster>
    BalancerMember http://127.0.0.1:8030
  </Proxy>

  RewriteEngine On
  # Rewrite index to check for static
  RewriteRule ^/$ /index.html [QSA]

  # Rewrite to check for Rails cached page
  RewriteRule ^([^.]+)$ $1.html [QSA]

  # Redirect all non-static requests to cluster
  RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
  RewriteRule ^/(.*)$ balancer://blog_cluster%{REQUEST_URI} [P,QSA,L]
  ...
</VirtualHost>

In lighttpd you might have:

123	server.modules = ( "mod_rewrite", ... )url.rewrite += ( "^/$" => "/index.html" )url.rewrite += ( "^([^.]+)$" => "$1.html" )

The proxy servers will then look for cached files in your /public directory. However, you may want to change the caching directory to keep things more separated. You'll see why shortly.

Moving your Page Cache

First you'd want to add the following to your /config/environment.rb:

config.action_controller.page_cache_directory = RAILS_ROOT + "/public/cache/"

This tells Rails to publish all your cached files in the /public/cache directory. You would then want to change your Rewrite rules in your httpd.conf to be:

  # Rewrite index to check for static
  RewriteRule ^/$ cache/index.html [QSA]

  # Rewrite to check for Rails cached page
  RewriteRule ^([^.]+)$ cache/$1.html [QSA]

Clearing out a partial/whole cache

When you start implementing page caching, you may find that when you add/edit/remove one model, almost all of your cached pages need to be expired. This could be the case if, for instance, all of your website pages had a list which showed the 10 most recent blog posts.

One alternative would be to just delete all your cached files. In order to do this you'll first need to move your cache directory (as shown above). Then you might create a sweeper like this:

class BlogSweeper < ActionController::Caching::Sweeper
  observe Post

  def after_save(record)
    self.class::sweep
  end
  
  def after_destroy(record)
    self.class::sweep
  end
  
  def self.sweep
    cache_dir = ActionController::Base.page_cache_directory
    unless cache_dir == RAILS_ROOT+"/public"
      FileUtils.rm_r(Dir.glob(cache_dir+"/*")) rescue Errno::ENOENT
      RAILS_DEFAULT_LOGGER.info("Cache directory '#{cache_dir}' fully sweeped.")
    end
  end
end

That FileUtils.rm_r simply deletes all the files in the cache, which is really all the expire_cache line does anyways. You could also do a partial cache purge by only deleting a cache subdirectory. If I just wanted to remove all the caches under /public/blog I could do:

12	cache_dir = ActionController::Base.page_cache_directory FileUtils.rm_r(Dir.glob(cache_dir+"/blog/*")) rescue Errno::ENOENT

If calling these File Utilities feels too hackerish for you, Charlie Bowman wrote up the broomstick plugin which allows you to "expire_each_page" of a controller or action, with one simple call.

Needing something more advanced?

Page caching can get very complex with large websites. Here are a few notable advanced solutions:

Rick Olson (aka Technoweenie) wrote up a Referenced Page Caching Plugin which uses a database table to keep track of cached pages. Check out the Readme for examples.

Max Dunn wrote a great article on Advanced Page Caching where he shows you how he dealt with wiki pages using cookies to dynamically change cached pages based on user roles.

Lastly, there doesn't seem to be any good way to page cache xml files, as far as I've seen. Mike Zornek wrote about his problems and figured out one way to do it. Manoel Lemos figured out a way to do it using action caching. We'll cover action caching in the next tutorial.

How do I test my page caching?

There is no built in way to do this in rails. Luckily Damien Merenne created a swank plugin for page cache testing. Check it out!

Conclusions

Page caching should be used if at all possible in your project, because of the awesome speeds it can provide. However, if you have a website with a member system where authentication is needed throughout, then you might not be able to do much with it outside of a login and new member form.

Ready learn about the other Rails Caching methods, continue to Part 2 of the tutorial.

Credits:
Thanks to Paul Davis and for help with proofing this article.
Shakespeare Comic design
Shakespeare original image - guuskrahe
All the rest - Open Clip Art Gallery - Public Domain (which means I really don't need to be writing this, so I'm going to bed, I hope you learned a lot today, goodnight.)

Sorry, comments are closed for this Post, but feel free to email us with your input. We'd love to hear it.

Brian Ketelsen – February 28, 2017 @ 01:25 PM

Nice job Orlando Rubyists! This is a great writeup on page caching. We Tampa kids will be following this series closely.

Zach Holman – February 28, 2017 @ 02:07 PM

Really liking this blog so far… look forward to seeing what else you’ll be publishing soon.

So for those of us with pages that can’t really be cached (or will have small bits of user account information per page that shouldn’t be cached), are you going to be writing an article for that? Or is that what you were alluding to towards the end of this article? I’d love to start caching a few pages, but as I mentioned, I’m somewhat stopped by the user functionality I want to include.

...push out these articles faster; this stuff is great for someone who’s not a newbie but not a complete advanced Rails developer, either. :)

G Billack – February 28, 2017 @ 02:48 PM

Zach, thanks for the kind words. I’ll be writing about action caching and then fragment caching, which both work great for pages that have dynamic elements in them, which sounds like what you need. ;-)

Chris T – February 28, 2017 @ 06:17 PM

Good article. How about doing something on page caching only for non-logged-in users? Have seen one or two things about it, but would welcome more on it

Philip Hallstrom – February 28, 2017 @ 11:14 PM

Regarding caching of XML files… this has worked well for caching our RSS feeds…

route….

map.tv_channel_feed ’/rss/tv/channel/:channel_id/feed.rss’, :controller => ‘tv’, :action => ‘rss’, :requirements => {:channel_id => /^\d+.$/}

controller….

caches_page :rss

Now, the view is rss.rhtml *not rss.xml so not sure if that’s what you’re referring to, but the above will cache requests to this action at: public/rss/tv/channel/123/feed.rss.

To tie in the links, the named route comes in handy and lets us do:

link_to “RSS”, tv_channel_feed_url(:channel_id => @channel)

If we were using rails 1.2.x we could simplify that route to get rid of the ”/feed” part, but we’re not.

Brian Pattison – March 01, 2017 @ 02:05 AM

Dugg! I can’t wait to see more from you guys! I fell in love with Rails about a month ago, so I want to learn anything and everything. Hope you keep the content flowing.

Matteo Alessani – March 01, 2017 @ 05:43 AM

Oh… You have a nice blog, with very very useful post! thanks again for this tutorial!

Sergio Espeja – March 01, 2017 @ 05:57 AM

Great Ruby on Rails tutorial! Thanks!

K. Adam Christensen – March 01, 2017 @ 09:30 AM

Nice write up. I may need to start going to the ORUG that you guys started. Was not aware of it.

Bill – March 01, 2017 @ 11:09 AM

Thanks so much for this! This is exactly what I was looking for.

Chuck Bergeron – March 02, 2017 @ 01:11 AM

Amazing post. By the far the most concise tutorial on caching I’ve read!

Marshall from WineQ.com – March 02, 2017 @ 01:31 PM

LoL… love the graphics… Great Work, Mr. P!

Steve C – March 06, 2017 @ 03:48 AM

Lovin the new blog - I’m just picking up RoR and I’ve used every post so far. Keep up the great work.

christopher w – March 06, 2017 @ 02:17 PM

Good article. I am currently trying to configure nginx to replace lighttpd and it seems reasonably successful. However, it seems that the rewrite rules I have are not serving the static files in the cache. I also noticed that rails was updating the cached page on every request. Is there a problem with using memcache at the same time here?

What would case rails to write the cached page, but the app not to use it and simply cache it again for every request?

Richard L – March 13, 2017 @ 12:27 AM

Great article - one thing we noticed, if you change your cache directory, you need to put a ’/’ prefix on your apache rewrites. i.e.

Rewrite index to check for static RewriteRule ^{/$ /cache/index.html [QSA]}

Rewrite to check for Rails cached page RewriteRule ^([.]+)$ /cache/$1.html [QSA]

Rails Envy

It’s not the size of the app that matters, it’s how you code it.

Ruby on Rails Caching Tutorial

Table of Contents

Why for art thou caching?

Caching!

Configuration

Page Caching

But what if I want to cache my pagination pages?

Cleaning up the cache

Sweepers

Playing nice with Apache/Lighttpd

Moving your Page Cache

Clearing out a partial/whole cache

Needing something more advanced?

How do I test my page caching?

Conclusions

Comments

Blog

Podcast

Looking for Videos?

FUNNY VIDEOS

SPEAKING

TUTORIALS

Tags

Contact Us

WHAT WE READ

WHAT WE USE