In-depth analysis of database and table sharding: principles, primary key generation, paging query, distributed transactions and high availability practices

1. Introduction

In large-scale Internet applications, as the amount of data continues to grow, the single-database, single-table architecture cannot meet the needs of high concurrency and big data storage.Sub-library and sub-tableIt is a common database architecture optimization solution that can improve database throughput, reduce the amount of data in a single table, and improve query efficiency. However, sharding also brings many complex problems, such asPrimary key generation, paging query, distributed transaction, cross-table query, high availabilitywait.

This article will explain in detail the core concepts, key technologies and implementation methods of sharding, and combine sample code to help developers understand how to apply sharding in actual projects.


2. Overview of sharding

2.1 What is sharding?

Database and table sharding is a technology that splits the data originally stored in a database table into multiple databases or multiple tables. There are mainly two ways:

  1. Sharding: According to a field (such as user_id) to split the data into different database instances and tables. For example:
    • User ID 1~1000 exist db1.user_table_1,1001~2000 exist db1.user_table_2
    • User ID 2001~3000 exist db2.user_table_1,3001~4000 exist db2.user_table_2
  2. Vertical Partitioning: Split according to business modules, such as User Table and Order Form Stored in different databases, suitable for data isolation of different business systems.

2.2 Why use sharding?

  • Solve the performance bottleneck of a single database:A single database cannot handle a large number of concurrent requests, and splitting the database can balance the traffic pressure.
  • Improve reading and writing efficiency: Reduce the amount of data in a single table and improve query and insertion speed.
  • Reduce storage costs: You can use multiple lower-configuration database instances to replace a high-configuration database.
  • Improve system high availability: Multiple database instances can be independent of each other to avoid single point of failure.

3. Generate primary keys for sub-databases and sub-tables

After the database and table are divided, the generation of a globally unique primary key becomes a key issue. Common primary key generation methods include:

  1. UUID: Globally unique, but long, with high storage overhead, and not conducive to index query.
  2. Database auto-increment ID: Available on a single machine, but cannot be unique across databases.
  3. Snowflake:

    • The distributed ID generation algorithm proposed by Twitter is based on timestamp + machine ID + serial number
    • Go implementation example:

      package main
      
      import (
       "fmt"
       "sync"
       "time"
      )
      
      const (
       workerIDBits  = 5
       datacenterIDBits = 5
       sequenceBits  = 12
      
       maxWorkerID  = -1 ^ (-1 << workerIDBits)
       maxDatacenterID = -1 ^ (-1 << datacenterIDBits)
       maxSequence  = -1 ^ (-1 << sequenceBits)
      )
      
      type Snowflake struct {
       mu           sync.Mutex
       lastTimestamp int64
       workerID     int64
       datacenterID int64
       sequence     int64
      }
      
      func NewSnowflake(workerID, datacenterID int64) *Snowflake {
       return &Snowflake{
           workerID:     workerID,
           datacenterID: datacenterID,
       }
      }
      
      func (s *Snowflake) NextID() int64 {
       s.mu.Lock()
       defer s.mu.Unlock()
      
       timestamp := time.Now().UnixNano() / 1e6
       if timestamp == s.lastTimestamp {
           s.sequence = (s.sequence + 1) & maxSequence
           if s.sequence == 0 {
               for timestamp <= s.lastTimestamp {
                   timestamp = time.Now().UnixNano() / 1e6
               }
           }
       } else {
           s.sequence = 0
       }
       s.lastTimestamp = timestamp
       return ((timestamp << (workerIDBits + datacenterIDBits + sequenceBits)) |
           (s.datacenterID << (workerIDBits + sequenceBits)) |
           (s.workerID << sequenceBits) |
           s.sequence)
      }
      
      func main() {
       sf := NewSnowflake(1, 1)
       fmt.Println(sf.NextID())
      }
  4. Database segment number segment method:
    • Preallocate ID segments, such as db1 generate 1~10000,db2 generate 10001~20000, which is suitable for a small number of databases.

4. Paginated query by database, table, and page

Paging query becomes complicated after sharding, because the data is scattered in multiple tables and databases. Common solutions are:

  1. Full database query + merge sort:
    SELECT * FROM (
       SELECT id, name, create_time FROM db1.user_table_1
       UNION ALL
       SELECT id, name, create_time FROM db1.user_table_2
    ) AS temp
    ORDER BY create_time DESC
    LIMIT 10 OFFSET 0;
  2. Leverage caching: First query the primary key ID, then query the detailed data in the sub-database to reduce the overhead of cross-database queries.

5. Distributed transactions after sharding

The core issue of distributed transactions is transaction consistency between multiple database instances. Common solutions are:

  1. TCC (Try-Confirm-Cancel) mode
  2. Local message table + reliable message queue
  3. Seata distributed transaction framework

Example: Use golang + gorm Solving distributed transaction problems:

func createOrder(db1, db2 *gorm.DB) error {
    return db1.Transaction(func(tx *gorm.DB) error {
        if err := tx.Exec("UPDATE users SET balance = balance - 100 WHERE id = ?", 1).Error; err != nil {
            return err
        }
        if err := db2.Exec("INSERT INTO orders (user_id, amount) VALUES (?, ?)", 1, 100).Error; err != nil {
            return err
        }
        return nil
    })
}

9. Conclusion

This article describes in detailPrinciples of sharding, primary key generation, paging query, distributed transactions, query without sharding key, capacity estimation and high availability solutions, and explains how to solve key problems with code examples.

🔗 References:

No Comments

Send Comment Edit Comment

|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠(ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ°Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
Emoticons
Emoji
Little Dinosaur
flower!
Previous