Cloud is not a glorified VPS
In spite of the deafening hype around cloud technologies for the past few years, I am surprised at how anemically Cloud is leveraged by most companies that I come across. When a company or a person says that they are on the cloud, a large percentage of them just mean getting to an EC2 instance and putting their app server / DB / etc of choice. This was available for the past decade from the VPS offerings. So what is so special about using a cloud like this?
I have also seen some of our customers truly leverage the cloud to great competitive advantage. It is not rocket science. Just awareness of the options out there and changing development and deployment architecture to exploit it.
I broadly classify the cloud features in 2 sets: IAAS features and Software block as a service.
Let’s get the easy thing over with first.
In addition to putting a credit card and getting an EC2 instance, leverage the true IAAS features to setup a programmable, scalable and highly available deployment using features like:
• Availability Zones
• Auto scaling
• Cloud watch
A detailed blog on some best practices and anti-patterns for AWS coming soon.
Software block as a service:
This is the more interesting and powerful cloud capability.
I deliberately didn’t use the word PAAS and coined the term SBAAS as many vendors are peddling many layers of services under this acronym. What I mean by Software block as a service are various software infrastructure components.
This is an easy one. Instead of getting an EC2 instance and deploying MySQL or Postgres on it, consider RDS. Saves a lot of pain in installing, patching and not to mention horizontal scaling like zero effort read replicas.
Many startups are worried about cost when they go for services like RDS. Instead, they put MySQL on Digital Ocean as it is cheaper to start with. But if you take into account the manpower cost of deploying and maintaining and business cost of business continuity impact, I would say in most cases, RDS is the way to go.
One of our customers runs RDS with some of the tables having billion+ rows. He hardly spends any time on DBA with RDS and if you compare it to the effort required to run this on your own hardware, its a no brainer.
This is really exciting service which facilitates the idiom of serverless architecture. If you have processing logic that you need to run in response to an event, the traditional way is to write this as a service and deploy it on an App server which is hosted on an EC2 instance. Then worry about capacity planning for this EC2 instance to meet the load expected.
Compare this to Lambda. You will simply write the processing function in Java or JS, configure it to be called on an event like a Kinesis message or HTTP verb. That’s it. No server deployment, no capacity planning. AWS will take care of provisioning the appropriate resources, call your Lambda and bill you for only what you have used.
Let me give you an example. Recently we had to write a PDF service which needs to convert enterprise form into PDF and process all its attachments which can be of various types like excel or doc and append it to the PDF.
Our initial approach was to write a Java service which will receive the request and call various PDF libraries like iText and Flying Saucer and deploy this on a Tomcat/EC2. We needed to do the deployment and more importantly, capacity planning was a pain. All the PDF conversion libraries load the whole file in memory and are CPU intensive. We don’t have an upper limit on the file size.
Our second approach is to simply write a Lambda service which would call the same PDF conversion service. AWS takes care of all capacity headaches.
You can even deploy small web apps by putting your Angular code on S3 and hook up your HTTP GET / POST to a Lambda function. This is a truly serverless web app.
To truly leverage cloud, a drastic shift in how we think about architecture is needed. I will give you a boring example of ETL. The old approach is to run a batch solution using a set of complex tools. While there is a time and place for these architectures, here is a cloud based real time streaming architecture.
If the problem is to move data from MySQL OLTP to OLAP star schema, use MySQL Binlog sniffer like https://github.com/zendesk/maxwell to convert deltas into a stream of kinesis events and leverage Lambda functions to transform and load. Slap D3 on this hosted on S3. Voila! Serverless, real-time analytics that can sell to handle millions of updates.
For serious SAAS applications, deriving insights from logs is crucial. The popular open source option for this is the ELK stack. Anybody who has run a non-trivial ELK cluster will address to the complexity of it. Now Amazon provides Elk as a service.
Gone are those days when application logs were meant for developers to debug. Instead, design a log separately for ingestion by ELK and analytics. Splunk is the popular enterprise option, but given its cost, ELK is becoming the dominant force. Now Amazon is taking the headache out of ELK ops, so you can focus on actual analytics with Kibana and/or D3.
There are many more software blocks on AWS and new ones are getting added at a fast pace. We will keep updating what we end up using here.