Slickstream's New Cloud

Since Slickstream went live a year ago, we've been growing quickly. Last October we managed a handful of websites, indexed a few thousand webpages daily, and processed a few hundred gigabytes of traffic per month. Now we are handling more than 600 websites, indexing more than half a million webpages in real-time, and over the past month have processed more than 30 terabytes of traffic.

For our infrastructure, we've had two simple goals: 100% uptime and response times that are faster than page load times, so that we integrate seamlessly and beautifully into every customer's site.

We haven't always succeeded. Growing by a factor of 100 in one year isn't easy. 

We've had to convert a little propeller plane into a jumbo jet without ever landing! We've had a few short unplanned outages -- usually caused by human error, but have maintained uptimes above 99.9%. We've kept response times generally within 500 milliseconds for at least 90% of pageviews and searches. 

Over the past two months, we've had a major initiative underway behind the scenes. Today, October 1st, 2020, I'm pleased to announce that we have completed that effort. The result is a system with dramatically increased scalability, resiliency, extensibility, and fault tolerance. We completed the rollover to this new system this morning without any downtime.

Businessman hand working with a Cloud Computing diagram on the new computer interface as concept

Want to know how it all works? Sure! At its core, the Slickstream cloud includes more than a dozen different types of servers running our own code that handle various functions for our customers -- from indexing to pageview processing to story hosting to billing, etc. Each of these is built using node.js and packaged into a Docker "container". We use Kubernetes to deploy any number of each of these servers in our cloud. So we can "dial up" as much capacity of whatever type we need almost instantly. In addition, we have 3 different web client codebases, separate from our server code, served to browsers from our servers. We use OVH as our data center provider and Cloudflare as our CDN to ensure fast access worldwide.

We have organized our cloud into "clusters". Clusters are siloed from each other. Each has its own mongodb database cluster, its own Elastic Search cluster, and its own home-grown command and control. When we roll out new software, we will now be doing that one cluster at a time. Issues will inevitably come up, but this greatly reduces the chance of a problem in one cluster affecting customers in other clusters. We monitor this new cloud using dashboards built using Grafana with Graphite to aggregate our real-time application data.

We also have a separate staging cloud that is a duplicate of what is described above. That's where we develop and test out new code before it goes live.

We're proud of what we've accomplished. But we hope this is the last time you'll ever have to hear about it. Our goal is for all of this to remain irrelevant to you so that you can just experience consistent fabulous fast reliable service.

So, glad to be servicing you in the new Slickstream cloud!