cat /dev/random meh

The Road To Serverless: Multi-tenancy

Author: Ed Huang (i@huangdx.net), CTO@PingCAP

Imagine if you need to support a large-scale database service with thousands of tenants in a cluster. Each tenant only accesses their own data, and there is a clear distinction between hot and cold data. As mentioned earlier, there may be 90% small-scale users, but you cannot predict when these users will suddenly grow. You need to design a system to handle this situation.

The most primitive method is to allocate a batch of machines to one user and another batch to another user. This physical isolation method is straightforward, but its drawbacks are inflexible resource allocation and many shared costs. For example, each tenant needs their own control system and log storage, which obviously makes it the most cumbersome physical isolation method.

The second way is to deploy on a container platform, utilizing virtualization to make the most of the underlying cloud platform’s resources. This method may be slightly better than the first solution, as it allows for more efficient utilization of the underlying hardware resources. However, the issue of shared costs mentioned earlier still exists.

The core idea behind the above two solutions is isolation. However, if you want to achieve a lower cost per user as the number of users increases, simply emphasizing isolation is not enough; Sharing is also required to achieve the desired effect of isolation.

TiDB Serverless allows multiple tenants to share a physical cluster while being logically isolated from each other from the perspective of different tenants. This is the key to reducing unit costs because multiple users share the same resource pool.

Sharing-based approach normally has some problems, such as insufficient resource sharing to meet the needs of large tenants, potential data confusion between tenants, and the inability to customize individual tenant requirements. Additionally, if a tenant requires customized functionality, the entire system needs to be modified to accommodate their needs, which can lead to decreased maintainability. Therefore, a more flexible and customizable multi-tenant implementation is needed to meet the diverse needs of different tenants. Let’s see how these issues are addressed in TiDB Serverless. Before introducing the design solution, let’s first look at some important design goals:

To address the first point of isolating and making the data of different tenants invisible to each other, this question needs to be considered from two aspects:

physical storage and metadata storage.

Isolating physical storage is relatively easy. As mentioned in previous post, TiKV internally shards the data, so even if the data is stored in S3, we can effectively differentiate between data from different tenants key prefixs. In TiDB 5.0, a mechanism called Placement Rules has been introduced to provide users with an operational interface for controlling the physical distribution of data at the semantic level (https://docs.pingcap.com/tidb/stable/placement-rules-in-sql). In the initial design of TiDB, the storage layer was kept minimal in terms of key encoding for performance reasons, without introducing the concept of namespaces or additional prefix encoding for user data, such as adding tenant IDs to differentiate between data from different users. These tasks now need to be done, and they are not particularly difficult in the storage layer. More considerations need to be given to compatibility and migration plans.

In traditional distributed databases, metadata storage is part of the system. For example, in traditional TiDB, the primary part of metadata stored by the PD component is the mapping between TiKV’s Key-Value pairs and specific storage nodes. However, in a DBaaS (Database-as-a-Service) environment, metadata goes beyond that. A better design is to extract metadata storage as a separate, shared service for multiple clusters. Here are the reasons:

Regarding the invisibility between tenants, there is another important module mentioned above, which is the Gateway we mentioned above. Without this, sharing an underlying TiDB cluster would not be feasible. This is easily understandable, similar to how you cannot have multiple root accounts in a database. Since TiDB uses the MySQL protocol, to achieve multi-tenancy, you always need to identify the tenant’s name somewhere. We chose a workaround: since we need to identify tenant information, it should be done at the beginning of a session. We only need to pass the tenant ID during authentication, and once the session is established, we naturally know which tenant the connection belongs to. As for compatibility with the standard MySQL protocol, session variables are not used. Instead, we simply add a prefix to the username. That’s why you see some strange prefixes before the usernames in TiDB Serverless.

Once the tenant ID is known at the beginning of the connection, all logical isolation can be achieved through this ID. As a proxy, the Gateway does more than a regular proxy. In TiDB Serverless, we directly extracted the code of the Session management module from the original TiDB codebase to implement the Gateway.

This design brings several benefits:

The Gateway is stateless and persistent. Considering the added latency and benefits, we believe it is acceptable.

By modifying the Gateway and metadata, we achieved logical isolation of multiple tenants on a single physical cluster. But how do we avoid the “Noisy Neighbor” problem? As mentioned earlier, we introduced the concept of RU (Request Unit). This leads us to mention the Resource Control framework introduced in TiDB 7.0 (https://docs.pingcap.com/tidb/dev/tidb-resource-control). Similar to Dynamo, TiDB’s Resource Control also uses the Token Bucket algorithm (https://en.wikipedia.org/wiki/Token_bucket). It associates different types of physical resources with RU and applies global resource control using the Token Bucket mechanism:

The benefits of implementing resource isolation through this approach compared to rigid physical isolation are evident:

However, implementing a multi-tenant service based on a shared large cluster poses a challenge: controlling the blast radius. For example, if a service experiences a failure or a severe bug, the impact may extend beyond a single tenant to a broader range. Currently, our solution is simple sharding, where a failure in one region does not affect another region. Additionally, for large tenants, we also provide traditional dedicated cluster services, which can be considered as an application-level solution to address this problem. Of course, we are continuously exploring this aspect.