Book: Designing data intensive Applications by Martin Kleppmann
Components needed:
For measuring scalability, we need to define the load and performance parameters.
Load should be based on bottlenecks, you should have a fair estimation of average read/write requests, average rate of cache hits and misses, take into account MTTF for hard disks, number of CPUs in parallel, etc.
Response time as a performance parameter: consider queuing delays, execution delays, etc. Even in similar scenarios, response time may not be similar to same request due to maybe context switch, pause for garbage collection, packet loss and TCP retransmission, etc. It is better to use percentiles to measure the response time. Median is a good start. Many companies use 98, 99 percentiles to measure their internal service response times. In order to add response time to monitoring dashboard of your service, maintain a running window of response times for a period of time(say last 10 mins) and keep updating medians and percentiles and plot them. Check the chapter 1 for already implemented efficient approaches for measuring and plotting the response times.
Head of line blocking, Tail latency amplification.
You might also have to choose between automatic scaling and manual scaling. Both have their own issues and benefits. Use manual scaling when load increases can be predicted well in advance with accuracy.
Some more factors to consider: Volumes of reads/writes, ratio of read/write operations, volume of data, complexity of data(time series, spatial), access patterns(is there any very common pattern of data access: optimize the system for it), response time requirements, etc.
An architecture that scales well for a particular application is built around assumptions of which operations will be common and which will be rareāthe load parameters. If those assumptions turn out to be wrong, the engineering effort for scaling is at best wasted, and at worst counterproductive.
Availability and maintainability, using abstractions to make things simpler.
Polyglot persistence, impedance mismatch(need for a layer for transition between different data models)
Refs:
http://nathanmarz.com/blog/principles-of-software-engineering-part-1.html