Terminology Confusion: Horizontal/Vertical Partitioning, Scaling, Sharding
If you are working on a large database and you want to improve the performance, then you will run into too many keywords.
Let me type those keywords:
- Horizontal Scaling
- Vertical Scaling
- Horizontal Partitioning
- Vertical Partitioning
I want to explain these keywords in simple sentences.
Adding new servers to the infrastructure. So, you can divide up the workload into different servers. A typical scenario: Write data to one server, read from other servers.
See also: Read replica, leader-follower replication.
Increasing the capacity of the server. Adding more CPU/RAM/Disk or migrating to a bigger server. If the server crashes, the database will be inaccessible. Because there is only one server.
See also: Single point of failure.
Splitting one table into different tables. Each table will have the same table structure. Moving/copying rows from the Users table to other tables.
All data will usually be kept in one database instance. It is not necessary to keep the data in a single instance, though. If you spread the data across the servers, it is often referred to as Sharding (which is explained below).
If the application is used by people younger than 25, you can split the table by the Age column. As the index size gets smaller, the response time to your target audience will also decrease.
See also: Range-based sharding.
Splitting one table into different tables. Each table will have a different table structure.
In some applications, only a portion of the data is usually needed. Let's say you need the user's mother and father names in only the admin panel, in the rest of the application you just need the user's name and age.
Horizontal partitioning and sharding are referring to the same things: Splitting one table into different tables. Each table will have the same table structure.
However, spreading the data across the servers/data centers/continents is often referred to as sharding. Sharding is a subtype of horizontal partitioning.
There are two popular sharding strategies: Range-based (it is explained in the horizontal partitioning section) and hash-based
The database engine will calculate a target server, by the table's primary key.