diff options
author | Yanqin Jin <yanqin@fb.com> | 2022-12-13 21:45:00 -0800 |
---|---|---|
committer | Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com> | 2022-12-13 21:45:00 -0800 |
commit | c93ba7db5ddc33f69f8f049cc59454985b17dc46 (patch) | |
tree | 8161ef502a8d76611bf6fbc66302eb2015595844 /db/db_impl/db_impl_write.cc | |
parent | 98d5db5c2ebed6ade6ff424215d7deae67f4593b (diff) |
Revise LockWAL/UnlockWAL implementation (#11020)
Summary:
RocksDB has two public APIs: `DB::LockWAL()`/`DB::UnlockWAL()`. The current implementation acquires and
releases the internal `DBImpl::log_write_mutex_`.
According to the comment on `DBImpl::log_write_mutex_`: https://github.com/facebook/rocksdb/blob/7.8.fb/db/db_impl/db_impl.h#L2287:L2288
> Note: to avoid dealock, if needed to acquire both log_write_mutex_ and mutex_, the order should be first mutex_ and then log_write_mutex_.
This puts limitations on how applications can use the `LockWAL()` API. After `LockWAL()` returns ok, then application
should not perform any operation that acquires `mutex_`. Currently, the use case of `LockWAL()` is MyRocks implementing
the MySQL storage engine handlerton `lock_hton_log` interface. The operation that MyRocks performs after `LockWAL()`
is `GetSortedWalFiless()` which not only acquires mutex_, but also `log_write_mutex_`.
There are two issues:
1. Applications using these two APIs may hang if one thread calls `GetSortedWalFiles()` after
calling `LockWAL()` because log_write_mutex is not recursive.
2. Two threads may dead lock due to lock order inversion.
To fix these issues, we can modify the implementation of LockWAL so that it does not keep
`log_write_mutex_` held until UnlockWAL. To achieve the goal of locking the WAL, we can
instead manually inject a write stall so that all future writes will be stopped.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11020
Test Plan: make check
Reviewed By: ajkr
Differential Revision: D41785203
Pulled By: riversand963
fbshipit-source-id: 5ccb7a9c6eb9a2c3fa80fd2c399cc2568b8f89ce
Diffstat (limited to 'db/db_impl/db_impl_write.cc')
-rw-r--r-- | db/db_impl/db_impl_write.cc | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/db/db_impl/db_impl_write.cc b/db/db_impl/db_impl_write.cc index a597c168d..cbeab046f 100644 --- a/db/db_impl/db_impl_write.cc +++ b/db/db_impl/db_impl_write.cc @@ -924,6 +924,15 @@ Status DBImpl::WriteImplWALOnly( write_thread->ExitAsBatchGroupLeader(write_group, status); return status; } + } else { + InstrumentedMutexLock lock(&mutex_); + Status status = DelayWrite(/*num_bytes=*/0ull, write_options); + if (!status.ok()) { + WriteThread::WriteGroup write_group; + write_thread->EnterAsBatchGroupLeader(&w, &write_group); + write_thread->ExitAsBatchGroupLeader(write_group, status); + return status; + } } WriteThread::WriteGroup write_group; @@ -1762,6 +1771,7 @@ uint64_t DBImpl::GetMaxTotalWalSize() const { // REQUIRES: this thread is currently at the front of the writer queue Status DBImpl::DelayWrite(uint64_t num_bytes, const WriteOptions& write_options) { + mutex_.AssertHeld(); uint64_t time_delayed = 0; bool delayed = false; { |