A complete guide to MySQL data migration without downtime: consistency guarantee and practical solution analysis

MySQL Data Migration Knowledge Graph

1 Core Challenges and Technology Selection

1.1 Three major challenges of data migration

Business continuity requirements：Financial-grade services require 99.99% availability throughout the year
Data consistency guarantee: Amount data must be zero error (refer to the Central Bank's "Payment System Data Security Specification")
Incremental Sync Complexity: Synchronous delay control in the scenario of 10,000 TPS per second

1.2 Tool comparison matrix

Tool Name	Backup Type	Lock mechanism	Recovery speed	Applicable scenarios	Official Documentation Link
mysqldump	Logical backup	Table lock	slow	Small and medium data migration	MySQL 8.0 Reference Manual
XtraBackup	Physical backup	Lock-free snapshots	quick	Migrate TB-level data	Percona XtraBackup Docs
pt-table-sync	Data Synchronization	Row-level validation	medium	Data consistency repair	Percona Toolkit Docs

1.3 Auto-increment primary key key parameters

-- View the current autoinc lock mode 
SHOW VARIABLES LIKE 'innodb_autoinc_lock_mode'; 

-- Modify parameters (requires restart) 
SET GLOBAL innodb_autoinc_lock_mode = 1;

Parameter description (refer to MySQL official documentation):

0: Traditional mode (the lock is released at the end of the INSERT statement)
1: Continuous mode (default value, batch insertion remains continuous)
2: Interleaved mode (high concurrency scenarios may produce gaps)

2 Detailed explanation of the four-phase migration plan

2.1 Data initialization phase

2.1.1 Physical backup solution

# Full backup (Percona XtraBackup)
innobackupex --user=backup --password=xxx --parallel=4 --compress /backup/ 

# Restore backup 
innobackupex --apply-log /backup/full/ innobackupex --copy-back /backup/full/

2.1.2 Logical backup optimization techniques

# Multi-threaded export (mydumper) 
mydumper -u root -p xxx -B db1 -T table1 -t 8 -o /backup/ 

# Fast import (set session-level parameters) 
mysql -h target_db -uroot -p db1 --init-command="SET SESSION foreign_key_checks=0; \ SET SESSION unique_checks=0; SET SESSION sql_log_bin=0;" < table1.sql

2.2 First data verification

2.2.1 Full Verification Solution

# Create a dedicated verification account (follow the principle of least privilege) 
CREATE USER 'checksum_user'@'%' IDENTIFIED BY 'xxx'; GRANT SELECT, PROCESS, REPLICATION CLIENT ON *.* TO 'checksum_user'@'%'; 

# Execute verification (Percona Toolkit) 
pt-table-checksum --nocheck-replication-filters --no-check-binlog-format \ --databases=db1 --tables=target_table h=source_host,u=checksum_user,p=xxx

2.2.2 Difference repair process

pt-table-sync --execute --verbose --transaction --print \
h=source_host,D=db1,t=target_table h=target_host,u=admin,p=xxx

2.3 Incremental synchronization phase

2.3.1 Binlog monitoring solution

# Python example (mysql-replication library) 
from pymysqlreplication import BinLogStreamReader

stream = BinLogStreamReader(
    connection_settings={
        "host": "source_host",
        "port": 3306,
        "user": "repl",
        "passwd": "xxx"
    },
    server_id=100,
    blocking=True,
    resume_stream=True,
    only_events=[WriteRowsEvent, UpdateRowsEvent, DeleteRowsEvent]
)

for binlogevent in stream:
    for row in binlogevent.rows:
        process_event(row) # Custom processing logic

2.3.2 Double write implementation example (Java Spring)

@Aspect
@Component
public class DoubleWriteAspect {

    @Value("${double.write.mode}")
    private String writeMode;

    @Around("execution(* com.example.repository.*.save*(..))")
    public Object aroundWrite(ProceedingJoinPoint pjp) throws Throwable {
        Object entity = pjp.getArgs()[0];

        if("SOURCE_FIRST".equals(writeMode)){
            saveToSource(entity);
            saveToTarget(entity);
        } else if("TARGET_FIRST".equals(writeMode)){
            saveToTarget(entity);
            saveToSource(entity);
        }

        return pjp.proceed();
    }
}

2.4 Traffic switching phase

2.4.1 Gradual Switching Solution

Grayscale release: Gradually cut the flow according to the ratio of 5% through the configuration center
Real-time monitoring: Monitor the following core indicators:
- Target database QPS growth rate
- Master-slave replication delay (Seconds_Behind_Master)
- Number of slow queries (Slow_queries)

Abnormal fuse: Set the automatic rollback threshold:

# Monitoring Rule Example (Prometheus) 
- alert: MigrationErrorRateHigh
 expr: increase(migration_errors_total[1m]) > 10
 for: 2m

3 Best Practices for Production Environments

3.1 Index Optimization Strategy

-- Disable secondary indexes during migration 
ALTER TABLE target_table ALTER INDEX idx_name INVISIBLE; 

-- Rebuild indexes after data import (Online DDL) 
ALTER TABLE target_table ADD INDEX idx_name (col_name) ALGORITHM=INPLACE, LOCK=NONE;

3.2 Connection Pool Configuration

# HikariCP configuration example
spring:
  datasource:
    hikari:
      maximum-pool-size: 50 -> 100
      connection-timeout: 3000 -> 10000
      leak-detection-threshold: 60000
      keepalive-time: 30000

3.3 Monitoring indicator list

Indicator name	Monitoring Thresholds	Alert strategy
Master-slave delay time	>30 seconds	Enterprise WeChat/DingTalk Alert
Number of thread connections	>max_connections*0.8	Automatically expand the connection pool
InnoDB Buffer Pool Hit Ratio	<95%	Optimize queries or expand memory capacity
Number of verification differences	>0	Triggering the automatic repair process

4 Typical troubleshooting solutions

4.1 Primary key conflict exception

Phenomenon：Duplicate entry 'xxx' for key 'PRIMARY'
Solution:

Pause incremental synchronization process

Fix conflicting data:

REPLACE INTO target_table SELECT * FROM source_table WHERE id=xxx;

Reset the auto-increment sequence:

ALTER TABLE target_table AUTO_INCREMENT = (SELECT MAX(id)+1000 FROM source_table);

4.2 Data Drift Problem

Troubleshooting steps:

Establish three time point controls:
- T1: Full backup time point
- T2: Incremental synchronization start time
- T3: Current time point

To perform differential analysis:

pt-table-checksum --since='T1' --until='T2' h=source_host
pt-table-checksum --since='T2' h=source_host

5 Evolution direction and further reading

5.1 Intelligent verification system

Machine Learning Prediction: Training prediction models based on historical difference data
Dynamic sampling calibration: Automatically enable sampling validation strategy for large tables (reference paper "Efficient Data Validation for Large Databases")

5.2 Recommended Reading

MySQL official manual:Data Migration Techniques
Percona Blog:Zero-Downtime MySQL Schema Changes
Alibaba Cloud Technical White Paper: Financial-Grade Data Migration Specifications

It is recommended to conduct at least three full-process drills before each major migration and establish a comprehensive emergency plan to ensure that the rollback can be completed within 30 minutes in extreme cases.