diff options
author | Changyu Bi <changyubi@meta.com> | 2023-10-12 16:55:25 -0700 |
---|---|---|
committer | Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com> | 2023-10-12 16:55:25 -0700 |
commit | 6e3429b8a6a53d5e477074057b5f27218063b5f2 (patch) | |
tree | fabeb5022bb872ea0e551564fa090342ccac465b /db/db_impl/db_impl.h | |
parent | 261e9be7b3d26f22084655cee76f32c304c6e2e9 (diff) |
Fix data race in accessing `recovery_in_prog_` (#11950)
Summary:
We saw the following TSAN stress test failure:
```
WARNING: ThreadSanitizer: data race (pid=17523)
Write of size 1 at 0x7b8c000008b9 by thread T4243 (mutexes: write M0):
#0 rocksdb::ErrorHandler::RecoverFromRetryableBGIOError() fbcode/internal_repo_rocksdb/repo/db/error_handler.cc:742 (db_stress+0x95f954) (BuildId: 35795dfb86ddc9c4f20ddf08a491f24d)
https://github.com/facebook/rocksdb/issues/1 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (rocksdb::ErrorHandler::*)(), rocksdb::ErrorHandler*>>>::_M_run() fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/invoke.h:74 (db_stress+0x95fc2b) (BuildId: 35795dfb86ddc9c4f20ddf08a491f24d)
https://github.com/facebook/rocksdb/issues/2 execute_native_thread_routine /home/engshare/third-party2/libgcc/11.x/src/gcc-11.x/x86_64-facebook-linux/libstdc++-v3/src/c++11/../../../.././libstdc++-v3/src/c++11/thread.cc:82:18 (libstdc++.so.6+0xdf4e4) (BuildId: 452d1cdae868baeeb2fdf1ab140f1c219bf50c6e)
Previous read of size 1 at 0x7b8c000008b9 by thread T22:
#0 rocksdb::DBImpl::SyncClosedLogs(rocksdb::JobContext*, rocksdb::VersionEdit*) fbcode/internal_repo_rocksdb/repo/db/error_handler.h:76 (db_stress+0x84f69c) (BuildId: 35795dfb86ddc9c4f20ddf08a491f24d)
```
This is due to a data race in accessing `recovery_in_prog_`. This PR fixes it by accessing `recovery_in_prog_` under db mutex before calling `SyncClosedLogs()`. I think the original PR https://github.com/facebook/rocksdb/pull/10489 intended to clear the error if it's a recovery flush. So ideally we can also just check flush reason. I plan to keep a safer change in this PR and make that change in the future if needed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11950
Test Plan: check future TSAN stress test results.
Reviewed By: anand1976
Differential Revision: D50242255
Pulled By: cbi42
fbshipit-source-id: 0d487948ef9546b038a34460f3bb037f6e5bfc58
Diffstat (limited to 'db/db_impl/db_impl.h')
-rw-r--r-- | db/db_impl/db_impl.h | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/db/db_impl/db_impl.h b/db/db_impl/db_impl.h index 73f4d2e5d..19e82bc9c 100644 --- a/db/db_impl/db_impl.h +++ b/db/db_impl/db_impl.h @@ -1864,7 +1864,8 @@ class DBImpl : public DB { void ReleaseFileNumberFromPendingOutputs( std::unique_ptr<std::list<uint64_t>::iterator>& v); - IOStatus SyncClosedLogs(JobContext* job_context, VersionEdit* synced_wals); + IOStatus SyncClosedLogs(JobContext* job_context, VersionEdit* synced_wals, + bool error_recovery_in_prog); // Flush the in-memory write buffer to storage. Switches to a new // log-file/memtable and writes a new descriptor iff successful. Then |