introduction
In a database system, transactionsDurabilityIt is the most critical part of ACID characteristics. It promises that once a transaction is successfully committed, the modified data will be permanently effective, and the data will not be lost even if the system crashes or hardware fails. However, the realization of this promise is not as simple as it seems. Will the data really not be lost after the transaction is committed? This depends on the database's internal log mechanism, persistence strategy, distributed architecture design, and hardware redundancy capabilities.
This article will explore in depth the implementation principles, potential risks, and optimization practices of data persistence after transaction submission, from stand-alone databases to distributed systems, from log flushing mechanisms to master-slave synchronization strategies, to provide developers with a comprehensive technical perspective.
1. Transaction persistence mechanism of stand-alone database
1.1 Core Components: Redo Log and Undo Log
Redo Log
- effect: Records the physical changes made by transactions to data, and is used to replay dirty page data that has not been flushed to disk during crash recovery.
- Persistence strategy:
innodb_flush_log_at_trx_commit
Parameters control the brushing behavior:- 0: Asynchronous disk flushing every second, only writing to the memory buffer when the transaction is committed.
- 1(Default): Each submission is synchronized to disk to ensure that data is not lost after a crash.
- 2: Write to the operating system Page Cache when submitting, relying on the system to flush the disk regularly.
Diagram 1: Redo Log flushing process
Transaction commit → Redo Log Buffer → (optional: Page Cache) → Disk
- If the parameter is 1: Directly skip Page Cache and force disk flushing (
fsync
). - If the parameter is 0 or 2: Relying on background threads or operating system asynchronous disk flushing, there is a window of data loss.
Undo Log
- effect: Records the data image before transaction modification, which is used for transaction rollback and MVCC multi-version control.
- Persistence Dependencies: Undo Log itself relies on the persistence mechanism of Redo Log to ensure the recoverability of rollback operations.
Key conclusions
- like
innodb_flush_log_at_trx_commit=1
, Redo Log must be written to disk when the transaction is committed, and durability is guaranteed. - If the parameter is 0 or 2, if a crash occurs after the transaction is committed, up to 1 second of data may be lost or the operating system may not flush data to disk.
1.2 Binlog and Two-Phase Commit (2PC)
Binlog
- effect: Records all logical operations of data changes (such as SQL statements) for master-slave replication and data recovery.
- Persistence strategy:
sync_binlog
Parameters control the brushing behavior:- 0: Depends on the operating system to refresh the disk, the default policy.
- 1: Each submission is synchronized to disk to ensure that Binlog is not lost.
Diagram 2: Collaboration between Binlog and Redo Log
Transaction commit → Redo Log (Prepare) → Binlog write → Redo Log (Commit)
Two-Phase Commit (2PC)
To ensure the consistency of Redo Log and Binlog, MySQL adopts a two-phase commit protocol:
- Prepare Phase:
- Redo Log is written and flushed to disk, and the status is marked as
prepare
.
- Redo Log is written and flushed to disk, and the status is marked as
- Commit Phase:
- Binlog is written and flushed to disk.
- The Redo Log status is updated to
commit
.
Crash recovery logic
- Case 1: Binlog not written → rollback transaction.
- Case 2: Binlog has been written but Redo Log has not been committed → Replay the transaction based on Binlog.
Configuration Recommendations
- High consistency scenario:
sync_binlog=1
+innodb_flush_log_at_trx_commit=1
. - High performance scenario:
sync_binlog=N
(e.g. 100) +innodb_flush_log_at_trx_commit=2
.
2. Persistence Challenges in Distributed Environments
2.1 Master-slave replication and data synchronization
Asynchronous Replication
- mechanism: The master node responds to the client immediately after submission, and pulls logs asynchronously from the slave node.
- risk: When the master node crashes, the unsynchronized data is permanently lost.
Diagram 3: Data flow of asynchronous replication
Master node → commit transaction → respond to client↓ (asynchronous) Slave node → pull log → apply changes
Semi-Synchronous Replication
- mechanism: After the master node submits, it waits for at least one slave node to confirm receipt of the log before responding to the client.
- Configuration parameters:
rpl_semi_sync_master_wait_for_slave_count
(Controls the number of confirmed slave nodes).
Diagram 4: Interaction flow of semi-synchronous replication
Master node → commit transaction → wait for slave node ACK → respond to client↓ (synchronous) Slave node → receive log → return ACK
Full-Synchronous Replication
- mechanism: The master node waits for all slave nodes to confirm the write before responding to the client.
- cost: High latency and reduced availability.
2.2 Distributed Transactions and Consensus Algorithms
In cross-database or microservice scenarios, transaction persistence relies on distributed protocols:
- XA Protocol: Coordinate multiple resource managers through two-phase commit (2PC), which poses risks of blocking and single point of failure.
- Paxos/Raft:Ensure log consistency through majority confirmation, such as used by systems such as ETCD and TiDB.
Diagram 5: Log replication process of the Raft algorithm
Leader → Send log entry → Majority of Followers confirm → Submit log
CAP Tradeoff
- CP System(such as ZooKeeper): Prioritize consistency and partition tolerance at the expense of availability.
- AP System(such as Cassandra): Prioritize availability and partition tolerance at the expense of consistency.
3. Impact of Hardware and Middleware
3.1 Disk Failure and Redundancy Solutions
- RAID Technology: Prevent data loss due to single disk failure through disk array redundancy (such as RAID 1/5/10).
- Persistent Storage: Use a battery-backed cache (BBWC) RAID card to prevent data loss during a power outage during the write process.
Diagram 6: RAID 1 mirror storage
Data block A → written to disk 1 and copied to disk 2
3.2 Comparison of Middleware Persistence Strategies
middleware | Persistence Mechanism | Analogy database parameters |
---|---|---|
Kafka | acks=all + min.insync.replicas=N | Semisynchronous replication |
Redis | appendfsync=always | sync_binlog=1 |
Elasticsearch | translog.durability=request | innodb_flush_log_at_trx_commit=1 |
4. Optimization and Risk Avoidance in Practice
4.1 Parameter Tuning Case
Scenario 1: Financial Trading System
- need: Zero data loss, strong consistency.
- Configuration:
innodb_flush_log_at_trx_commit=1 sync_binlog=1 rpl_semi_sync_master_enabled=1 rpl_semi_sync_master_wait_for_slave_count=2
- cost: TPS dropped by about 30%, and the performance loss needs to be compensated by hardware (SSD, 10G network).
Performance comparison table | Configuration | TPS | Average latency | Data loss risk |
---|---|---|---|---|
Default parameters (asynchronous) | 10k | 5ms | high | |
Strong consistency parameters | 7k | 15ms | Low |
Scenario 2: Log Analysis System
- need: High throughput, allowing for minute-level data loss.
- Configuration:
innodb_flush_log_at_trx_commit=2 sync_binlog=100 innodb_flush_method=O_DIRECT_NO_FSYNC
- income: Write performance improved by 50%~80%.
4.2 Monitoring and Disaster Recovery Solutions
Monitoring Metrics:
- Redo Log flushing delay (
Innodb_os_log_fsyncs
) - Binlog synchronization status (
Master_Log_File
andRead_Master_Log_Pos
) - Slave Node Latency (
Seconds_Behind_Master
)
- Redo Log flushing delay (
Disaster recovery design:
- Cross-data center disaster recovery: Master-slave synchronization based on GTID achieves RPO (recovery point objective) in seconds.
- Regular backups: Physical backup (Percona XtraBackup) + Binlog incremental backup.
Diagram 7: Cross-data center disaster recovery architecture
Main computer room (Beijing) → Synchronous log → Backup computer room (Shanghai) ↓ Asynchronous backup → Cloud storage (AWS S3)
V. Conclusion and Outlook
The data persistence after transaction submission is not absolute, but relies on multi-level technical guarantees:
- Single machine level:Achieve crash recovery through Redo Log, Binlog flushing strategy and two-phase commit.
- Distributed Layer: Ensure multi-copy consistency through master-slave synchronization and consensus algorithm.
- Hardware level: Rely on RAID and persistent storage to reduce the risk of physical failure.
In the future, with the popularization of non-volatile memory (NVM) and storage-computing integrated architecture, the overhead of log flushing may be completely eliminated, and transaction persistence will usher in new technological breakthroughs. However, at this stage, reasonable configuration and architecture design are still the cornerstones of ensuring data security.
References
- MySQL 8.0 Reference Manual
https://dev.mysql.com/doc/refman/8.0/en/ - "Designing Data-Intensive Applications" Martin Kleppmann
https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/ - TiDB Architecture Overview
https://docs.pingcap.com/tidb/stable/tidb-architecture - Kafka Durability Guarantees
https://kafka.apache.org/documentation/#design_durability - Redis Persistence
https://redis.io/topics/persistence
Appendix: Parameter Configuration Quick Check Table
parameter | Safety value | Performance Value |
---|---|---|
innodb_flush_log_at_trx_commit | 1 | 0 or 2 |
sync_binlog | 1 | 100 |
rpl_semi_sync_master_enabled | 1 | 0 |
Originality Statement
The content of this article is based on public technical documents and practical experience summary. Reproduction without permission is prohibited.