Why are you moving to the cloud?

It is becoming more and more normal for companies to consider the cloud for their hosting requirements, in particular where new builds are concerned. The advantages of cloud hosting are now well known, and it is no longer a bleeding edge technology, but something that is providing real value to business of all sizes.

Some of the inertia behind the cloud seems to be driven from a perspective of cost, in particular there is a perception that running in the cloud is cheaper than on premise. I have been involved with many architectures (both new and old) that have moved to the cloud, and a good majority of these have been driven from a cost perspective. This idea, coupled with a lack of knowledge of how to manage a cloud infrastructure and to utilise its benefits, is actually driving cost of hosting up, as well as affecting performance and stability. I have long been preaching the necessity to understand exactly how the cloud works, and some of the recommended practices or culture changes required when moving to the cloud, and I thought it was about time I put these out to the wider world.

Time to Scale

Most cloud providers offer auto-scaling functionality to allow you to deal with peaks in demand. This is a critical factor for ensuring performance, and one of the best parts about cloud hosting. The other often missed part, is scaling down. This is also crucial, as it optimises the cost of your hosting and ensures you really do only pay for what you need. You should think out clearly your scaling strategy alongside your architecture to ensure you have every part of your architecture ready. Scaling should cover both vertical and horizontal scaling as well, not just horizontal. This is particularly applicable for Platform as a Service (PaaS) services, which offer differing tiers, and tend to hold you ransom for higher performance tiers.

Ideally you should have a view of your traffic profiles over time and adjust your auto scaling factors to account for these. For instance, on weekdays you might scale between x and y for web servers, but on the weekends, you might scale between y and z. Likewise for your horizontal scaling. Scaling is not a golden bullet either, and you should consider the following when planning your scaling strategy;

  • Scaling isn’t ‘instant’ in most cases, and can sometimes take minutes to hours. If you know you have spikes at certain times, then scale beforehand
  • Scaling any 3rd party software may have licensing impacts, so beware
  • Consider all elements of your architecture when setting scaling parameters. There is no point in scaling out to 30 web servers if you will just create a bottleneck on one database server
  • Consider Infrastructure Automation tools like Salt and Chef to help create advanced scaling rules, as well as managing the auxiliary services impacted by scaling
  • Scaling should not replace capacity planning. Scaling is to optimise cost and cope with spikes in demand, and should be built on top of a well-planned architecture

Stability

When moving to the cloud, I often see companies with a perception of outsourcing networking and infrastructure knowledge. It’s true that the benefit of cloud hosting is that you no longer have to worry about infrastructure and networking, but you need to understand this, and in particular understand how your cloud hosting provider deals with it, in order to design stable systems. Graceful degredation, dependency failures etc don’t disappear in the cloud, but they can be abstracted if you don’t fully understand how your hosting provider presents these issues. This can lead to architectures that don’t take into account failure scenarios that they perhaps would have in on premise architectures. Stability is a huge issue in the cloud, and one that you should be thinking about from day one. If you are migrating an existing on premise architecture, then this is doubly important as you should not be expecting the same levels of stability, and should be planning how your architecture will cope with multiple service failures (which is a fact of cloud life).

I often see the clouds inherent instability as an advantage, as it forces you to think about resiliency and failover from day one, which only serves to increase overall stability in the application. However I often see people disregarding this aspect, instead relying on underlying cloud provider SLA’s and good old backups. Stability in the cloud basically means ‘anything can happen’, and it usually does. Services will frequently fail, contention on resources will affect performance, servers are self-healed, demand spikes add resource contention etc etc. This requires you to really consider this when deploying, and try to eliminate all single points of failure and have failover plans in place for all parts of the architecture.

Testing is a critical part of deploying to cloud, and should involve some level of failure testing. This would typically involve testing how your architecture handles under load when certain services fail. If you are on AWS, then Netflix’s Chaos Monkey (part of the Simian Army toolset) will help you introduce this type of testing, as well as giving you the rare privilege of yelling ‘Release the Monkeys!’ every time you deploy.

Culture, DevOps, and Tooling

Taking full advantage of the cloud will almost certainly require a change in culture, as well as the addition of new tooling into your architecture and deployment processes. Your culture should acknowledge and embrace the instability in cloud, through the following (as a minimum);

  • Developers should be ‘coding defensively’
  • Architects should be architecting for new failure scenarios
  • Load testing should be the norm
  • Monitoring and Alerting tools should be implemented from day one
  • Self-healing infrastructure should become the norm
  • Infrastructure automation should be utilised
  • Service failure testing (testing failure of different elements within the architecture)

Without adopting these practices then the potential for falling foul to the traps of cloud will become more likely, and frankly the benefits of moving to the cloud will be lost.

Developers and DevOps should also be utilising the on-demand nature of cloud to provide instant test servers, trying out new products and services, and continually improving all aspects of infrastructure. One common example is UI testing, which historically can prove to be a bit flaky. When running something like Selenium on your build server, there is a potential for lots of hanging browsers, which over time will impact the build server. To rectify this, you can instead put up an on-demand server specifically to run UI tests, and then shut it down once complete. While this might affect build time, it does increase stability of the overall process. This method can be expanded to add new build agents as desired to cope with load testing etc. The underlying message is that the culture of the team should adapt to take advantage of what becomes available through cloud hosting.

Performance

If any has ever told you that the cloud is fast, then you have every justification to strap them to a chair and force them to endure endless reruns of Frozen – Singalong edition while eating Rum and Raisin ice cream, a fate worse than enduring any Vogon poetry for sure. Of course the cloud can be fast if architected correctly, but compared to physical dedicated hardware….not likely.  So If you are thinking of a ‘lift and shift’ approach to your infrastructure to migrate to the cloud, this should be an immediate red flag to you and any architect involved. Hosting on-premise with dedicated hardware is an entirely different beast to hosting in the cloud. Where you might think it is a simple matter of having the same number of VM’s, this really isn’t the case, and will certainly cost you more as well as be unlikely to offer the same level of performance.

Performance from the cloud comes with the use of the available services, as well as the aforementioned scaling ability. Things like Content Delivery Networks, SSD storage etc are all likely part of your cloud provider, and so should be utilised to enhance performance.

These are some of the things that I have often seen forgotten when thinking of cloud hosting, and it’s always to the detriment of the solution.

Let me know your experiences of moving to the cloud, and in particular which things are often overlooked.