summaryrefslogtreecommitdiff
path: root/utilities
diff options
context:
space:
mode:
authorcheng-chang <57096775+cheng-chang@users.noreply.github.com>2020-12-23 11:32:10 -0800
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>2020-12-23 11:33:26 -0800
commitbdb7e544bd86c6648087d6b479a648a8c717cc9a (patch)
tree6156fe4ec69c99e71b2e632a7b1723001c879245 /utilities
parentbd2645bc3403b69e6a2782bbb6f62735bc0d9c13 (diff)
Skip WALs according to MinLogNumberToKeep when creating checkpoint (#7789)
Summary: In a stress test failure, we observe that a WAL is skipped when creating checkpoint, although its log number >= MinLogNumberToKeep(). This might happen in the following case: 1. when creating the checkpoint, there are 2 column families: CF0 and CF1, and there are 2 WALs: 1, 2; 2. CF0's log number is 1, CF0's active memtable is empty, CF1's log number is 2, CF1's active memtable is not empty, WAL 2 is not empty, the sequence number points to WAL 2; 2. the checkpoint process flushes CF0, since CF0' active memtable is empty, there is no need to SwitchMemtable, thus no new WAL will be created, so CF0's log number is now 2, concurrently, some data is written to CF0 and WAL 2; 3. the checkpoint process flushes CF1, WAL 3 is created and CF1's log number is now 3, CF0's log number is still 2 because CF0 is not empty and WAL 2 contains its unflushed data concurrently written in step 2; 4. the checkpoint process determines that WAL 1 and 2 are no longer needed according to [live_wal_files[i]->StartSequence() >= *sequence_number](https://github.com/facebook/rocksdb/blob/master/utilities/checkpoint/checkpoint_impl.cc#L388), so it skips linking them to the checkpoint directory; 5. but according to `MinLogNumberToKeep()`, WAL 2 still needs to be kept because CF0's log number is 2. If the checkpoint is reopened in read-only mode, and only read from the snapshot with the initial sequence number, then there will be no data loss or data inconsistency. But if the checkpoint is reopened and read from the most recent sequence number, suppose in step 3, there are also data concurrently written to CF1 and WAL 3, then the most recent sequence number refers to the latest entry in WAL 3, so the data written in step 2 should also be visible, but since WAL 2 is discarded, those data are lost. When tracking WAL in MANIFEST is enabled, when reopening the checkpoint, since WAL 2 is still tracked in MANIFEST as alive, but it's missing from the checkpoint directory, a corruption will be reported. This PR makes the checkpoint process to only skip a WAL if its log number < `MinLogNumberToKeep`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7789 Test Plan: watch existing tests to pass. Reviewed By: ajkr Differential Revision: D25662346 Pulled By: cheng-chang fbshipit-source-id: 136471095baa01886cf44809455cf855f24857a0
Diffstat (limited to 'utilities')
-rw-r--r--utilities/backupable/backupable_db_test.cc10
-rw-r--r--utilities/checkpoint/checkpoint_impl.cc12
2 files changed, 14 insertions, 8 deletions
diff --git a/utilities/backupable/backupable_db_test.cc b/utilities/backupable/backupable_db_test.cc
index 8d8ca95f8..1503a8c84 100644
--- a/utilities/backupable/backupable_db_test.cc
+++ b/utilities/backupable/backupable_db_test.cc
@@ -70,6 +70,16 @@ class DummyDB : public StackableDB {
DBOptions GetDBOptions() const override { return DBOptions(options_); }
+ using StackableDB::GetIntProperty;
+ bool GetIntProperty(ColumnFamilyHandle*, const Slice& property,
+ uint64_t* value) override {
+ if (property == DB::Properties::kMinLogNumberToKeep) {
+ *value = 1;
+ return true;
+ }
+ return false;
+ }
+
Status EnableFileDeletions(bool /*force*/) override {
EXPECT_TRUE(!deletions_enabled_);
deletions_enabled_ = true;
diff --git a/utilities/checkpoint/checkpoint_impl.cc b/utilities/checkpoint/checkpoint_impl.cc
index 2caea3037..d496da57c 100644
--- a/utilities/checkpoint/checkpoint_impl.cc
+++ b/utilities/checkpoint/checkpoint_impl.cc
@@ -232,14 +232,11 @@ Status CheckpointImpl::CreateCustomCheckpoint(
// this will return live_files prefixed with "/"
s = db_->GetLiveFiles(live_files, &manifest_file_size, flush_memtable);
+ if (!db_->GetIntProperty(DB::Properties::kMinLogNumberToKeep, &min_log_num)) {
+ return Status::InvalidArgument("cannot get the min log number to keep.");
+ }
+
if (s.ok() && db_options.allow_2pc) {
- // If 2PC is enabled, we need to get minimum log number after the flush.
- // Need to refetch the live files to recapture the snapshot.
- if (!db_->GetIntProperty(DB::Properties::kMinLogNumberToKeep,
- &min_log_num)) {
- return Status::InvalidArgument(
- "2PC enabled but cannot fine the min log number to keep.");
- }
// We need to refetch live files with flush to handle this case:
// A previous 000001.log contains the prepare record of transaction tnx1.
// The current log file is 000002.log, and sequence_number points to this
@@ -385,7 +382,6 @@ Status CheckpointImpl::CreateCustomCheckpoint(
for (size_t i = 0; s.ok() && i < wal_size; ++i) {
if ((live_wal_files[i]->Type() == kAliveLogFile) &&
(!flush_memtable ||
- live_wal_files[i]->StartSequence() >= *sequence_number ||
live_wal_files[i]->LogNumber() >= min_log_num)) {
if (i + 1 == wal_size) {
s = copy_file_cb(db_options.wal_dir, live_wal_files[i]->PathName(),