As an enterprise web development agency, we frequently deal with high-traffic, high-bandwidth sites. We use a horizontally-scalable architecture built on Amazon’s AWS platform to ensure great performance, high availability, and low costs for our clients. This allows us to serve essentially any amount of traffic to sites without breaking a sweat.
While most sites are primarily text-based, the larger size of images means that bandwidth from images can have an outsized effect on bandwidth cost and server load.
To tackle the problem of serving large volumes of images while minimising costs, we developed Tachyon, our scalable image service. Tachyon integrates with Amazon S3, and integrates with WordPress through the Tachyon plugin combined with our S3 Uploads plugin (but can also be used for non-WordPress projects).
Our First Attempts
When we initially looked at solving issues around images, we set out to solve two main issues: image regeneration, and caching. Rather than creating and storing thumbnails on upload, we wanted a dynamic system, which would allow us to easily create or change the available sizes, along with allowing complex cropping. To combine the dynamicism of this system with the desired performance, we’d also require heavy caching. Our initial solution to this was WPThumb, a PHP-based solution installed as a WordPress plugin.
While WPThumb handled a lot of our use cases, the server load it generated caused a lot of CPU time and power to be spent, and the caching solution still used disk space for the intermediate image files. This filesystem-based caching becomes a problem when horizontally scaling application servers, as operating a shared image cache requires using either an object store (such as Amazon’s S3) or a networked hard drive solution. As we moved to a horizontally-scaled infrastructure on AWS, we began to investigate alternatives that would allow us to move away from WPThumb. At the same time, we were moving our image storage to Amazon S3, which provided a durable, scalable object store for our images.
Our first port of call for moving to scalable image serving was to investigate cloud services, which manage the entire process automatically. We looked at several image services, settling on Imgix, a cloud image processor. Imgix is immensely powerful, with every feature we needed as well as many more, as well as a usable, developer-friendly API. Additionally, it supports S3 as a first-class source for our images, which was a huge benefit for us, allowing us to write less connecting code.
Initially, we bridged WPThumb and Imgix, allowing us to progressively migrate away from WPThumb’s storage. We eventually replaced this with WP-Imgix, a more tightly focussed plugin which offloaded most of the handling to the Imgix side.
While we used Imgix for a while, it turned out that Imgix was not really the service for us. Our requirements from an image service are pretty minimal (mostly limited to resizing and cropping), whereas Imgix is positioned as a full image service, with a huge number of features available. They also have a complex image serving infrastructure which, while very cool, was hugely overpowered for our needs. This was reflected in our monthly bill, which far exceeded the benefit we were getting from the service.
We began looking further afield for alternative solutions. The first one that came to mind was Automattic’s Photon service. This is the service used for all images served by WordPress.com, as well as Jetpack-connected WordPress sites. This service is only available for sites using Jetpack, and is subject to the same terms of service, both of which were barriers to our usage. Additionally, the scale of our sites and image serving meant that our image bandwidth was significant, and pushing that much traffic to Automattic out of the blue wasn’t something we wanted to do. (It would have also hidden a significant cost by essentially abusing the free Photon service rather than charging the actual costs.)
However, Photon is open source, so we immediately saw an opportunity. Starting first with Photon itself, then with a fork, and eventually with a rebuilt solution, we created our own image service: Tachyon.
Tachyon is our solution to serving WordPress images at scale. It’s a simple service that takes images from S3, performs image manipulation, and serves up the result. It consists of two parts: the image service, and the WordPress integration, which work in tandem for the full experience. We heavily leverage AWS services to provide a scalable, highly-available, low-cost service.
Tachyon is 100% open source, and is available under our standard support policy. You can deploy Tachyon in multiple ways, including serverless handling with AWS Lambda, or to EC2 using our production-ready CloudFormation configuration.
For caching and serving images, we place Tachyon behind AWS CloudFront, a globally-available CDN. We turn the caching way up, with a cache lifetime of 2 years. This combines with the immutability of WordPress uploads to provide low-cost, permanent caching for generated images.
(Why “Tachyon”? Tachyons are a theoretical faster-than-light particle; in other words, tachyons are faster than photons.)
Scalable WordPress Image Serving
Tachyon was originally a fork of Photon, with deep integration into Amazon S3 to streamline the image fetching. We realised that the architecture of Photon didn’t really match our infrastructure design or requirements. As a PHP application, it needs to run on a full-stack server (EC2 instance), which requires careful design for scaling and high availability. This would require at least two application servers with auto-scaling, as well as a load balancer, in order to provide the level of service we wanted.
To improve this process, we rewrote the functionality entirely from scratch in Node.js using the fantastic sharp library. This allowed us to deploy Tachyon in a serverless configuration, serving from AWS Lambda without requiring application server management. Until recently, reverse proxy EC2 instances were still required, as Amazon API Gateway did not fully support binary responses. This has changed recently, allowing us to remove our EC2 instances and move to a fully-serverless architecture. (This is not yet fully supported by CloudFormation, and requires a manual step to enable binary responses.)
As a global company, we have clients all around the world in most AWS regions. While Lambda is available in most regions, it’s not available in all of them, and was a pretty new service when we first began working on Tachyon. For these other regions, we fall back to a standard highly-available EC2 configuration. The good news is that Lambda has been enabled in almost every AWS region, so this fallback isn’t usually needed.
On top of the Tachyon application, we use AWS CloudFront for permanent (2 year TTL) caching and global content distribution. This CloudFront distribution automatically calls the Lambda function (via API Gateway) for a cache miss, generating the image on-the-fly before caching and distributing to the global CDN. This allows us to minimise costs, as the processing is only called when it’s necessary, and CloudFront cache storage is free: you pay only for outbound HTTP transfer from the client, and to the origin (Lambda), plus per request.
Tachyon is structured in such a way that everything works independently: you can use other CDNs atop Tachyon, or use it with other sites if you’re not serving WordPress images. Our typical setup has a single S3 bucket per region, so we use a single Tachyon Lambda function per region with CloudFront in front of it, with all of our region’s images passing through this. Multiple buckets require multiple Lambda functions, however you can use a single CloudFront distribution with multiple functions.
If you want to run Tachyon yourself, grab a copy of the Tachyon server code and pick your deployment variant. We highly recommend using Lambda, with CloudFront for caching, but your deployment strategy depends on what you want to do. We also offer a Docker container for quick and easy deployment to Docker environments (including for local development).
Tachyon’s only dependencies are images stored in S3, and sharp/libvip’s requirements (node v4+, plus node-gyp for the native library).
Your WordPress sites will also require the Tachyon plugin, which sets the thumbnail URLs in WordPress correctly. We also recommend using our S3 Uploads plugin for pushing uploads to S3, however third-party S3 plugins should work fine.
We’d love to hear from anyone using Tachyon, and improve it based on feedback. We’ve been running Tachyon in production on high-traffic websites for over a year and we’re excited to show it off to the wider community.