Highly Available, Self Healing and Fault Tolerant Applications on AWS

AWS is well known for touting their tennants of highly available, self healing and fault tolerance. In this post we'll look at using resources that justify those claims and show some of the magic that you can realize using the AWS architecture. We will be creating several Ec2 instances as part of a launch configuration and use an elastic load balancer to manage the traffic to them. We'll then use an autoscaling group and see that when killing instances new ones will take their place. When an instance is terminated the load balancer will stop sending traffic to that instance while the autoscaling group fire a new instance to replace the terminated one.

Making a Load Balancer

It's important to realize that load balancers and autoscaling groups are all part of the Ec2 service. This means that the Ec2 menu will be the starting point for all services that we are going to leverage in this tutorial. In the lower left side bar of the Ec2 menu click on load balancer and create a new load balancer. The load balancer creation is pretty straight forward but be sure to pick a VPC and you'll need to add subnets. Yours should look something like this.

You'll need to configure the health check. I've set my ping protocol to TCP for this example. You likely don't need to change anything else here unless you are doing something exotic.

The Launch Configuration

The next step is to have a launch configuration. This is very similar to launching a single instance from the Ec2 menu but here you will provision several instances to put in your load balancer. Be sure in the configuration details tab to choose "Assign a public IP to every instance". Here is what it looks like:

We'll also configure some user data too so we can clearly see the load balancer hitting an individual instance. Use the script below to install an Apache Webserver and serve a basic web page.

#!/bin/bash
yum update -y  
yum install -y httpd  
service httpd start  
chkconfig httpd on  
groupadd www  
usermod -a -G www ec2-user  
chown -R root:www /var/www  
chmod 2775 /var/www  
find /var/www -type d -exec chmod 2775 {} +  
find /var/www -type f -exec chmod 0664 {} +  
 echo "<h1> IT'S WORKING !!!!</h1>" > /var/www/html/test.html

The AutoScaling Group

You can use the autoscaling creating wizard here.
The key point here is that we want to set metrics that will cause AWS to fire new instances. This is the justification that AWS is self healing and fault tolerant.

Cattle not Pets

If one instance had a problem, the health check would see this, terminate that instance, and replace it with a new healthy instance. This is the self healing process. Often the idiom of "cattle not pets" is used to describe this overall process. Cattle are meant for killing and consumption whereas pets are something that are helped and cared for. It's a much lower level of effort to automatically kill the server and start another one than it is to diagnose and restart.

Setting the Autoscale alarms

Pick the vpc and subnets and you are now ready to add scaling alarms.

Beware of the max size.

Create an alarm to add an instance.

Don't forget to take action once the your alarm is set

Go check your Ec2 dashboard and watch your instances launch.

After your instances have time to register with the load balancer you can follow the DNS address of the load balancer found here.

When you append "/test.html" to the above uri you'll see the web page we created in the user data on one of the instances registered with the load balancer. We can see it working below.

After terminating an instance you'll see the autoscaling magic as a new instance is kicked off and put in the load balancer. The load balancer keeps a record of this activity and you can see below where I killed an instance and auto scaling started another one to replace it.

Where Does it Leave Us?

The key point here is that we have extended the simple use of an Ec2 instance by using the load balancer. The load balancer is arguably the AWS resource that gives you high availability. In this example we have used it to host a web server and serve a simple web page. Self healing was a concept demonstrated by the combination of launch configuration and autoscaling. The launch configuration allowed us to provision instances to put in the load balancer while auto scaling allowed us to set alarms and take specific action to increase or decrease the number of instances based on specific metrics. The ability of the load balancer to have health checks and stop sending traffic to unhealthy instances is what makes AWS fault tolerant.

Take it Further

The extensibility of what we have discussed here is nearly endless in the context of AWS cloud computing. It would be easy to take this further and use Route 53 to purchase a domain name and create a record set pointing to the elastic load balancer. Then the user would see your application or website and under the hood you would have the security and high availability of AWS. I hope you can take some of these concepts and architect your own solutions on AWS.