Let’s talk Scale

I’ll admit, I haven’t been great at getting new posts put up in the past couple weeks since I started back to school. Yes, yes I know, I should be able to multitask and get some good blog posts up from time to time but it’s crazy how 24 hours can go back in the blink of an eye. Anyway, that not withstanding, I thought this time I would discuss the idea of scale, why one would need to scale an application or service and the types of problems that can rear itself if not designed appropriately.

How many times have you heard someone say “this can’t scale?” or “this won’t scale?” and immediately thought “well you didn’t design with that architecture in mind!?” If you’re like me, that’s my typical gut response to those types of statements. Why is it so complicated to create something that starts small and scales? Answer: it isn’t IF you think about and know what you are trying to achieve. Just because you can do something, doesn’t mean you should. Take for instance EMC ScaleIO, software defined storage at scale. ScaleIO allows you to start small and grow to hundreds or thousands of nodes that auto-balance, expand/contract and protect data automatically with no manual tuning needed. Okay, great but why do I care? Exactly, this type of architecture is great IF and I’ll say it again IF your application calls for it.

Want an example of what I’m talking about? Let’s take MongoDB for example. For those of you that aren’t familiar with MongoDB it is a NoSQL database that is constructed as a document oriented database using a document such as JSON.┬áContrast this with a key value store like Memcached or Redis or a tabular database like BigTable or Hbase and the need for scale can start to rear its ugly head. For those of you who are familiar with a traditional RDMS such as Oracle, MongoDB stores a business function in the minimal number of documents versus breaking it up into multiple relational structures like Oracle would typically do. Why does this matter? What happens when you want to expand an Oracle database? You have to vertically scale (scale up) your environment to meet this challenge which typically means adding more beefy boxes filled with more CPU and memory. Ouch, why would anyone do that? Why wouldn’t you just add more nodes into your cluster and expand/shrink your dataset as it expands and shrinks?

The answer is you would and should IF this is your desired state and you are ready to tackle the challenge of switching from a vertical scale up architecture to a horizontal scale out architecture such as MongoDB or Cassandra (more on Cassandra in a later post). As you can see from the image above, there is certainly nothing wrong with scaling an application or dataset up or out but remember just because you can do something doesn’t mean you should. Please take steps to think about why you are thinking of a NoSQL database like MongoDB and what type of environment you wish to deploy it on. Perhaps it is utilizing something like ScaleIO as mentioned earlier or maybe it’s through the use of Azure/AWS. That choice is left up to you as well as the circumstances that befall you if you try to, wait for it….SCALE!!!

Where have you ran into problems scaling an application or data set?



Leave a Reply