Avoid allocations/copies for large `GetMergeOperands()` results (#10458)

Summary: This PR avoids allocations and copies for the result of `GetMergeOperands()` when the average operand size is at least 256 bytes and the total operands size is at least 32KB. The `GetMergeOperands()` already included `PinnableSlice` but was calling `PinSelf()` (i.e., allocating and copying) for each operand. When this optimization takes effect, we instead call `PinSlice()` to skip that allocation and copy. Resources are pinned in order for the `PinnableSlice` to point to valid memory even after `GetMergeOperands()` returns. The pinned resources include a referenced `SuperVersion`, a `MergingContext`, and a `PinnedIteratorsManager`. They are bundled into a `GetMergeOperandsState`. We use `SharedCleanablePtr` to share that bundle among all `PinnableSlice`s populated by `GetMergeOperands()`. That way, the last `PinnableSlice` to be `Reset()` will cleanup the bundle, including unreferencing the `SuperVersion`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10458 Test Plan: - new DB level test - measured benefit/regression in a number of memtable scenarios Setup command: ``` $ ./db_bench -benchmarks=mergerandom -merge_operator=StringAppendOperator -num=$num -writes=16384 -key_size=16 -value_size=$value_sz -compression_type=none -write_buffer_size=1048576000 ``` Benchmark command: ``` ./db_bench -threads=$threads -use_existing_db=true -avoid_flush_during_recovery=true -write_buffer_size=1048576000 -benchmarks=readrandomoperands -merge_operator=StringAppendOperator -num=$num -duration=10 ``` Worst regression is when a key has many tiny operands: - Parameters: num=1 (implying 16384 operands per key), value_sz=8, threads=1 - `GetMergeOperands()` latency increases 682 micros -> 800 micros (+17%) The regression disappears into the noise (<1% difference) if we remove the `Reset()` loop and the size counting loop. The former is arguably needed regardless of this PR as the convention in `Get()` and `MultiGet()` is to `Reset()` the input `PinnableSlice`s at the start. The latter could be optimized to count the size as we accumulate operands rather than after the fact. Best improvement is when a key has large operands and high concurrency: - Parameters: num=4 (implying 4096 operands per key), value_sz=2KB, threads=32 - `GetMergeOperands()` latency decreases 11492 micros -> 437 micros (-96%). Reviewed By: cbi42 Differential Revision: D38336578 Pulled By: ajkr fbshipit-source-id: 48146d127e04cb7f2d4d2939a2b9dff3aba18258
author: Andrew Kryczka <ajkr@users.noreply.github.com> 2022-08-04 00:42:13 -0700
committer: Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com> 2022-08-04 00:42:13 -0700
commit: 504fe4de80851074d472dd784e8cfccd22fd82e8 (patch)
tree: 80692666284ab1e3730d9ef79a6595db417ba20c /microbench
parent: d23752f672abe0287c42cc2985dac43a61172845 (diff)
1 files changed, 6 insertions, 0 deletions
diff --git a/microbench/db_basic_bench.cc b/microbench/db_basic_bench.cc
index a7aa64774..6c70ad21d 100644
--- a/microbench/db_basic_bench.cc
+++ b/microbench/db_basic_bench.cc
@@ -841,6 +841,9 @@ static void DBGetMergeOperandsInMemtable(benchmark::State& state) {
     if (num_value_operands != static_cast<int>(kNumEntriesPerKey)) {
       state.SkipWithError("Unexpected number of merge operands found for key");
     }
+    for (auto& value_operand : value_operands) {
+      value_operand.Reset();
+    }
   }
 
   if (state.thread_index() == 0) {
@@ -938,6 +941,9 @@ static void DBGetMergeOperandsInSstFile(benchmark::State& state) {
     if (num_value_operands != static_cast<int>(kNumEntriesPerKey)) {
       state.SkipWithError("Unexpected number of merge operands found for key");
     }
+    for (auto& value_operand : value_operands) {
+      value_operand.Reset();
+    }
   }
 
   if (state.thread_index() == 0) {
author	Andrew Kryczka <ajkr@users.noreply.github.com>	2022-08-04 00:42:13 -0700
committer	Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com>	2022-08-04 00:42:13 -0700
commit	504fe4de80851074d472dd784e8cfccd22fd82e8 (patch)
tree	80692666284ab1e3730d9ef79a6595db417ba20c /microbench
parent	d23752f672abe0287c42cc2985dac43a61172845 (diff)