Hi! I'm Wade, head of Platform Engineering at Wikia. What does that mean? It means I am responsible for keeping Wikia's servers up, running and making them as fast as possible for our communities.
In this job I’ve often been asked “How are you able to successfully serve over a billion pageviews a month for Wikia?" The four word answer is: Caching, Caching, Caching and SSDs! Ok, that’s actually 3 words and an acronym but let’s not get hung up on the grammatic details. :)
Research & Key terms
Wikia's TechOps team has spent significant amounts of time and effort to understand how our users consume our service so we can best deliver pages consistently and quickly. We work closely with our development teams to research new tools, methods, software and hardware to pull out every millisecond of performance we can find while keeping our costs of running the service as low as possible so we can keep our service free for everyone. Analysis of our users and traffic is a key part to our team's work.
In describing our technical setup, a couple of key terms you will see me use:
- Cache - the storage of website files for later re-use at a point more quickly accessed by the end user (this is essentially a copy of a page)
- Edge cache - the first layer of cache closest to our users.
- Varnish - the application we use to serve edge cache pages.
- Memcache - a high performance database caching application used to minimize the need to access the MySQL database for content that's been queried previously.
- MySQL database servers - servers that are used to store and retrieve all the content in our communities wikis.
- Buffer cache - the cache used by the MySQL server to avoid the need to get data from the SSDs.
- Solid State Disks (SSDs) - high-performance drives where the database stores all Wikia content.
How does it all work?
I'll now walk you through the major stops within our technical setup. The most important thing to know is that caching is made up of a number of layers and you will encounter a different layer depending on what wiki or page you are visiting as well as your location in the world.Lets imagine you're visiting your favorite page on Runescape . Once you get there, behind the scenes, this is what happens:
The first stop is to our Content Delivery Network (CDN) which is operated by Fastly (a company founded by former Wikian Artur Bergman). Fastly uses Varnish to deliver you a page from the edge cache. The edge cache contains all the most frequently viewed pages from your region in the last 24 hours.
If you visit a page that has not been visited by another person in your region in the last 24 hours, you are not served from the edge cache, but instead you're referred to one of our Apache servers, the next tier of our service. Once an Apache server gets your request it will look into the memcache object store to see if the page already exists. If the Apache finds your page in memcache you will be automatically served the page from memcache and the edge cache server will be updated so it now has a copy too.
If your page is not found, the Apache will continue to the next tier, the MySQL database server. Here there can be another cached version of the page in what is called the buffer cache. If the page is in the buffer cache, it will be built and served to you from there. If not, the MySQL database server will request the page from our SSD hard drives to get you the authoritative version of the page.
Wow that's a lot! And it all happens in less than 200 milliseconds - or 1/5th of 1 second!
So where in the cache layers do most people end up? Well, we’ve been successful at achieving an edge cache hit ratio over 90% and we’ve seen as high as 94%! That means 90-94% of our users get served out of the edge cache in micro-seconds rather than milli-seconds.
Of those who don't get served from the edge cache, 85% of them are served by the Apaches from the memcache. That brings the number of user requests satisfied from a cache up to 95-98%. The remaining are served from either the MySQL buffer cache or directly from the SSDs.
How we're improving?
We're always looking at ways to improve our service, which means we're constantly reviewing and tuning our caching layers. We're continually researching new tools, software as well as components and have built one of, if not THE only, entirely SSD based production environments. Since SSDs operate 100 to 200 times faster than traditional hard drives, we're able to support 100's of thousands of wiki's and offer services such as Semantic MediaWiki or DPL that would be impossible on a service of our scale with traditional hard drives.
Since the foundation of our service is based entirely on open source software from it's critical underpinning MediaWiki as well as Linux, Apache, Memcache, MySQL, Solr, Varnish, Squid, KVM, Nagios, Vyatta, Ganglia, Chef and a myriad of other projects, we try to contribute back to our open source development community. We feel this is an important part of our TechOps role, and look forward to continuing and growing our collaborations.
I hope this gives you a small (but detailed) glimpse into what we do here at Wikia in the TechOps corner. If you have any questions, please ask them below!
Want to receive updates on the latest Staff blog posts? Then click here to follow this blog.