-
Notifications
You must be signed in to change notification settings - Fork 113
Closed
Labels
CONFIRMEDThe issue should be fixed, and it is able to reproduceThe issue should be fixed, and it is able to reproduceIN_PROGRESSA fix is being worked on and should be pushed shortlyA fix is being worked on and should be pushed shortlysingularity
Description
We currently impose the following assumption:
- If the master has a validated shard block header, then the shard block is persisted in slave's db
This assumption holds when the cluster is running properly. However, it can be broken if
- the slave's machine/container is crashed so that even the block is put, it may not be persisted as rocksdb's fsync is disabled by default rocksdb may lose data after shutting down the machine/container directly #486
- the slave's may persist the block, but the cluster is shut down before the block header is added to master. Therefore, after restarting the cluster, the slave assumes the block is validated in the cluster, but actually is not
The second issue may be more common. A fix is to introduce two-phase commitment. In general, there are two fixes
- write phase 1 information that a block is not commit
- after the shard block is broadcasted to all slaves and master, the block is committed
Additionally, after restarting the cluster, the block that is not committed will be scanned and recovered before the cluster is active
Metadata
Metadata
Assignees
Labels
CONFIRMEDThe issue should be fixed, and it is able to reproduceThe issue should be fixed, and it is able to reproduceIN_PROGRESSA fix is being worked on and should be pushed shortlyA fix is being worked on and should be pushed shortlysingularity