featured_caching

Dynamic sites at static speed: the art of website caching

By Ben Cook Posted Nov. 12, 2015 Reading time: 7 minutes

A lot has been said recently about the merits of static sites. But in many situations a dynamic approach is a necessity. Whether a content management system, a customer relationship tool, or online store, they allow end users to maintain complex sites quickly and consistently. And when put together properly, they can rival static-sites for speed.

any application that needs to frequently read and write data will cause a noticeable delay

Whatever system you use, dynamic sites typically comprise of similar elements. These are a form of web server, a backend and an application, written in one or more programming languages. This combination of components give a great deal of flexibility, but each contributes its own overhead and increases load time, something all modern web sites want to avoid. This is especially true with database access;any application that needs to frequently read and write data will cause a noticeable delay.

This where caching and an appropriate caching strategy for your use case will help. The basic aim of caching is to prevent unnecessarily frequent calls between the application database layers and instead use pre-generated static HTML pages, which are much faster to render in a browser.

 

Browser caching

The first cache that any web user would have noticed is the cache in their browser. How many times have developers asked you to undertake a “force-refresh” to see changes? Browser caches are simple but a good starting point to begin explaining caching concepts. A browser stores representations of web pages visited on a user’s computer, typically updating them once per session if changes are detected or forced by the site.

 

Proxy caching

A common tool employed by site owners and administrators is a ‘reverse proxy cache’ that sits between page requests made by a web browser and the web application. It intercepts requests and renders copies of pages straight from the cache, thus providing a noticeable speed boost.

There are several major proxy cache options available for self-install or as ‘Software as a Service’. (We are ignoring cloud hosting providers who typically package everything you might need into a self-contained web stack.)

Popular proxy cache options include:

SaaS options for caching generally lie in the world of Content Delivery Networks (CDNs) which instead of placing a cache between a user and a web stack, serve users sets of cached content that are geographically closest to them. It’s a subtle difference, but one that for large sites with global audiences can make a significant difference.

 

Using Varnish

Varnish is available in all Linux package managers, as a Docker image and many other options, read the project’s installation page for more details.

Basic Varnish configuration

Varnish stores a default configuration file either at /usr/local/etc/varnish/default.vcl or /etc/varnish/default.vcl, written in VCL (Varnish Configuration Language). This configuration file gets compiled into a small program via a C interpreter to boost speed even more.

Depending how you installed Varnish, the configuration file will look something like this:

backend default { 
    .host = "127.0.0.1"; 
    .port = "8000"; 
}

At it’s simplest, this defines the default backend used by Varnish, defining the host and port that it should listen and intercept content on.

Backend polling

One handy feature of Varnish is checking at predefined intervals if a backend is still healthy. It’s called ‘Backend Polling’ and is configured by adding a probe section into the backend declaration:

.probe = { 
    .url = '/'; 
    .timeout = 34ms; 
    .interval = 1s; 
    .window = 10; 
    .threshold = 8; 
}

The above are the default settings provided by Varnish and tell it to visit a particular .url every .interval and that if for at least .threshold out of .window probes, the url responds within .timeout milliseconds, the backend is still considered healthy. Once considered ‘unhealthy’, content is served from the cache for a pre-defined period.

Starting Varnish

We’ll cover specific changes to the Varnish configuration under each platform option, for now let’s take a look at general options.

Ports

Initially the port for your web server will need changing from the default. For example in the Apache Vhost configuration change the port to 81 or 8080.

Start the Varnish daemon with the varnish command or using a service wrapper. The daemon has flag options, the most common and useful being:

  • -f: Sets the path to the configuration file.
  • -s: Cache storage options. Setting this to RAM will provide even greater speed boosts.

Checking all is working

Run the varnishstat command or visit isvarnishworking.com to check your Varnish server is ready and listening to requests.

What not to cache

There are certain parts of a site that we don’t want to cache, for example the administration pages. We can exclude them by creating a vcl_recv subroutine in the default.vcl file containing an if statement that defines what not to cache:

sub vcl_recv { 
    # URI of admin folder
    if (req.url ~ "^/url/")
    {
        return (pass); 
    }
    return(lookup); 
}

If you are using Varnish 4, things are slightly different, including return values. The vcl_recv function now returns ahash value instead of a lookup.

sub vcl_recv { 
    ...
    return(hash); 
}

This is also where we set sites or subdomains that Varnish should ignore by adding a req.http.host ~ ‘example.com’ to the if statement.

Cookies

By default Varnish will not cache content from the backend that sets cookies. Similarly, if the client sends a cookie, it will bypass Varnish straight to the backend.

Cookies are frequently used by sites to track user activity and store user specific values. Generally these cookies are only of interest to client-side code and are of no interest to the backend or Varnish. We can tell Varnish to ignore cookies, except in particular areas of the site:

if ( !( req.url ~ ^/admin/) ) {
    unset req.http.Cookie;
}

This if statement ignores cookies unless we are in the admin area of the site, where cookie passing may be of more use (unless you really want to frustrate site administrators).

Other exceptions

With a default installation, Varnish also doesn’t cache password protected pages, GET and HEAD requests.

 

Putting Varnish to use

We will now look at two perfect use cases for Varnish: Drupal and Magento. Both are highly dynamic systems that allow non-technical users to undertake a wide variety of complex tasks. This can lead to database query-heavy page loads and busy sites will become noticeably slow. Typical pages built with these systems will have a mixture of content updated infrequently and frequently.

Drupal

Drupal has default caching options that perform similar functions to Varnish, but won’t provide the flexibility or speed increases required by larger or more complex sites.

In true Drupal manner there is a module for handling Varnish integration to save some of the manual configuration outlined above.

Install the module and make sure you follow the installation instructions included in the module’s Read Me file.

Make sure that the /etc/default/varnish file has the following daemon options set (and the indentation is important):

DAEMON_OPTS="-a :80 \
        -T localhost:6082 \
        -f /etc/varnish/default.vcl \
        -S /etc/varnish/secret \
        -s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,128M"

Ensure Apache and any associated virtual hosts are listening on port 8080, not 80. Restart both services after making these changes.

You may need to set a ‘Varnish Control Key’ in the module configuration page. Find out what that key is with the cat /etc/varnish/secret command and paste it into the settings page. Select the correct Varnish version, save the settings and you should see a series of green ticks at the bottom of the page.

The Varnish module interacts with the default Drupal cache settings, so make sure you have that enabled and configured for your use case.

Run varnishstat from the command line, start navigating the site as an anonymous user and you should see stats changing in the command output.

One of the paths we don’t want to cache in Drupal are the admin pages, we can do this with a vcl_recv sub-routine:

sub vcl_recv { 
    # URI of admin folder
    if (req.url ~ "^/admin/")
    {
        return (pass); 
    }
    unset req.http.Cookie;
    return(lookup); 
}

You might want to consider not caching (logged in) user pages, system update pages and other pages generated by highly dynamic modules such as flag that make extensive use of ajax to function. Do this by adding further req.url parameters to the if statement.

Magento

A default installation of Magento ships with an internal caching system that stores static versions of site elements in a specified folder. The System -> Cache Management page provides an overview of current cache status as well as letting you clear all or individual component caches. You can clear aggregated CSS and JS files and auto-generated image files from this page.

The forthcoming version 2 of Magento will support Varnish caching by default, but for now we need to make use of 3rd party plugins, I recommend the Turpentine module. Make sure you read the project’s readme file as it notes some extra configuration steps, ignoring them may break your site.

The Turpentine module is highly configurable and will make the necessary changes to vcl files and Varnish config for you. Some key options to set are:

  • Backend Host: The Varnish host, defaults to 127.0.0.1
  • Backend Port: The port Varnish is running on, defaults to 80
  • URL Blacklist: A list of URLs to never cache relative to the Magento root. The admin and API urls are automatically included.

The Turpentine module ties into the default Magento cache, so clearing caches on the Varnish cache page will clear the relevant Varnish caches.

 

General tips

Aside from using Varnish with any of the dynamic systems above, here are a handful of other miscellaneous tips that will help the cache-ability of any site.

Consistent URLs

If you are serving the same content in different contexts, it should use the same URL. For example don’t mix usage of article.html, article.htm and article, though your CMS may allow it. This will lead to three different cached versions of the same content.

Use cookies sparingly

As we saw above, cookies are hard to cache and are rarely as necessary as we think. Try to limit their use and number to dynamic pages.

File handling

Loading site assets can be one of the most time consuming parts of a page render and there simple tips to reduce this burden:

Using CSS Image sprites for iconography instead of multiple small files results in less network traffic.

Hosting CSS and JavaScript libraries locally means less network traffic and more control over caching strategies. This can mean an increase in maintenance overhead to keep these assets up to date. Store these assets in consistently named folders so references to them can also be consistent.

 

Fast forward

I hope this introduction to speeding up your dynamic sites with caching was useful. The performance gain is worth an initial period of configuration, experimentation and tweaking. In this era of short attention spans and impatience, any speed gain you can squeeze out of your set up will make the difference to your users and competition.

 

Featured image, network cache image via Shutterstock.

Aa