Over the past two and a half years the applications we host for customers grew constantly up to several dozen projects today. We insisted on our principles at each point of time over the years: Keeping things simple. We successfully develop software using this principle and we successfully operate our customers’ applications using this principle. Dijkstra knew that “simplicity is prerequisite for reliability” and we state that it is the prerequisite for security, maintainability, scalability and basically whatever else you want.
Up to now we hosted our projects on a single more or less powerful machine, having a backup on a secondary server that kept database and files in sync. Failover (which we fortunately never had to use) was handled by manually routing traffic from the primary to the backup machine. As the number of projects grew we were searching for a (simple but stable, powerful but cheap) solution that offers
- fully automated, real-time fail-over for HTTP traffic
- scalable design
- economical efficiency
- ready for our upcoming Ruby on Rails cloud™ infrastructure
What came out of what you see in Figure 1. The setup consists of a publicly accessible layer where DNS points to (web) servers. All traffic is terminated there. Incoming traffic is routed to the background application servers internally.
For the web slaves, we have two machines running FreeBSD 8.2 that act as reverse proxies (utilizing nginx) to the backend application servers (Passenger). As most of our applications use SSL there are quite a lot of public IP addresses configured on both BSD machines. Failover for the web servers¹ (on IP level) is handled by the amazing CARP protocol.
CARP can be configured to run in MASTER/BACKUP mode which means the MASTER gets all the traffic and BACKUP only comes into play when the master fails. CARP can also run in load balancing mode where traffic is distributed among the participating machines. To decide whether the servers are up, the master machine sends out advertisement packets so the backup slaves know it is alive. If the backups do not see the advertisement packets for some time, one of the backup slaves becomes the new master. See FreeBSD handbook and OpenBSD FAQ for more details.
The web servers have both an interface within our public VLAN (that is Internet) as well as the private one, which is used for connections to the application servers. Failover (or load balancing) for the backend application servers is carried out by nginx’ proxy module.
This failover on IP level is so important for us because we have dozens of domains pointing to unique IPs (as mentioned, because of SSL) at our infrastructure and we do not even have control over some of the DNS servers. In case of hardware problems with the web servers, we have the secondary waiting to take over automatically with nearly no downtime. The very same for the application servers: If one of them dies, the secondary automatically takes over.
A nice side effect of this setup is that we can easily make use of nginx’ caching possibilities and SSL traffic is loaded off to separate machines. We’re currently planning to add a layer to our infrastructure where customers can have dedicated rails instances in the backend and profit from the failover web slaves, too.
We rolled out this new infrastructure a few weeks ago using Puppet for automated configuration of all involved parts. We are optimistic to give a detailled report about our (positive) experience at some point in the future.
¹ In case you wonder about the hostnames: We name our machines after professors of the computer science department at the University of Augsburg where most of us studied (or are still enjoy studying in case of our student trainee and those writing their thesis at makandra :)).
You can follow any response to this post through the Atom feed.