diff options
author | Peter Dillinger <peterd@meta.com> | 2023-12-14 22:13:32 -0800 |
---|---|---|
committer | Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com> | 2023-12-14 22:13:32 -0800 |
commit | 88bc91f3cc2b492b8a45ba2c49650f527df97ad8 (patch) | |
tree | fe23f5a35a716feb95801e313d7d986634b153f6 /cache | |
parent | cd577f605948894b51fbaab39d1df03a04dfd70f (diff) |
Cap eviction effort (CPU under stress) in HyperClockCache (#12141)
Summary:
HyperClockCache is intended to mitigate performance problems under stress conditions (as well as optimizing average-case parallel performance). In LRUCache, the biggest such problem is lock contention when one or a small number of cache entries becomes particularly hot. Regardless of cache sharding, accesses to any particular cache entry are linearized against a single mutex, which is held while each access updates the LRU list. All HCC variants are fully lock/wait-free for accessing blocks already in the cache, which fully mitigates this contention problem.
However, HCC (and CLOCK in general) can exhibit extremely degraded performance under a different stress condition: when no (or almost no) entries in a cache shard are evictable (they are pinned). Unlike LRU which can find any evictable entries immediately (at the cost of more coordination / synchronization on each access), CLOCK has to search for evictable entries. Under the right conditions (almost exclusively MB-scale caches not GB-scale), the CPU cost of each cache miss could fall off a cliff and bog down the whole system.
To effectively mitigate this problem (IMHO), I'm introducing a new default behavior and tuning parameter for HCC, `eviction_effort_cap`. See the comments on the new config parameter in the public API.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12141
Test Plan:
unit test included
## Performance test
We can use cache_bench to validate no regression (CPU and memory) in normal operation, and to measure change in behavior when cache is almost entirely pinned. (TODO: I'm not sure why I had to get the pinned ratio parameter well over 1.0 to see truly bad performance, but the behavior is there.) Build with `make DEBUG_LEVEL=0 USE_CLANG=1 PORTABLE=0 cache_bench`. We also set MALLOC_CONF="narenas:1" for all these runs to essentially remove jemalloc variances from the results, so that the max RSS given by /usr/bin/time is essentially ideal (assuming the allocator minimizes fragmentation and other memory overheads well). Base command reproducing bad behavior:
```
./cache_bench -cache_type=auto_hyper_clock_cache -threads=12 -histograms=0 -pinned_ratio=1.7
```
```
Before, LRU (alternate baseline not exhibiting bad behavior):
Rough parallel ops/sec = 2290997
1088060 maxresident
Before, AutoHCC (bad behavior):
Rough parallel ops/sec = 141011 <- Yes, more than 10x slower
1083932 maxresident
```
Now let us sample a range of values in the solution space:
```
After, AutoHCC, eviction_effort_cap = 1:
Rough parallel ops/sec = 3212586
2402216 maxresident
After, AutoHCC, eviction_effort_cap = 10:
Rough parallel ops/sec = 2371639
1248884 maxresident
After, AutoHCC, eviction_effort_cap = 30:
Rough parallel ops/sec = 1981092
1131596 maxresident
After, AutoHCC, eviction_effort_cap = 100:
Rough parallel ops/sec = 1446188
1090976 maxresident
After, AutoHCC, eviction_effort_cap = 1000:
Rough parallel ops/sec = 549568
1084064 maxresident
```
I looks like `cap=30` is a sweet spot balancing acceptable CPU and memory overheads, so is chosen as the default.
```
Change to -pinned_ratio=0.85
Before, LRU:
Rough parallel ops/sec = 2108373
1078232 maxresident
Before, AutoHCC, averaged over ~20 runs:
Rough parallel ops/sec = 2164910
1077312 maxresident
After, AutoHCC, eviction_effort_cap = 30, averaged over ~20 runs:
Rough parallel ops/sec = 2145542
1077216 maxresident
```
The slight CPU improvement above is consistent with the cap, with no measurable memory overhead under moderate stress.
```
Change to -pinned_ratio=0.25 (low stress)
Before, AutoHCC, averaged over ~20 runs:
Rough parallel ops/sec = 2221149
1076540 maxresident
After, AutoHCC, eviction_effort_cap = 30, averaged over ~20 runs:
Rough parallel ops/sec = 2224521
1076664 maxresident
```
No measurable difference under normal circumstances.
Some tests repeated with FixedHCC, with similar results.
Reviewed By: anand1976
Differential Revision: D52174755
Pulled By: pdillinger
fbshipit-source-id: d278108031b1220c1fa4c89c5a9d34b7cf4ef1b8
Diffstat (limited to 'cache')
-rw-r--r-- | cache/cache_bench_tool.cc | 5 | ||||
-rw-r--r-- | cache/clock_cache.cc | 135 | ||||
-rw-r--r-- | cache/clock_cache.h | 68 | ||||
-rw-r--r-- | cache/compressed_secondary_cache_test.cc | 2 | ||||
-rw-r--r-- | cache/lru_cache_test.cc | 60 |
5 files changed, 208 insertions, 62 deletions
diff --git a/cache/cache_bench_tool.cc b/cache/cache_bench_tool.cc index 89945abf7..07cd1a1f6 100644 --- a/cache/cache_bench_tool.cc +++ b/cache/cache_bench_tool.cc @@ -48,6 +48,10 @@ DEFINE_uint64(cache_size, 1 * GiB, "Number of bytes to use as a cache of uncompressed data."); DEFINE_int32(num_shard_bits, -1, "ShardedCacheOptions::shard_bits. Default = auto"); +DEFINE_int32( + eviction_effort_cap, + ROCKSDB_NAMESPACE::HyperClockCacheOptions(1, 1).eviction_effort_cap, + "HyperClockCacheOptions::eviction_effort_cap"); DEFINE_double(resident_ratio, 0.25, "Ratio of keys fitting in cache to keyspace."); @@ -391,6 +395,7 @@ class CacheBench { FLAGS_cache_size, /*estimated_entry_charge=*/0, FLAGS_num_shard_bits); opts.hash_seed = BitwiseAnd(FLAGS_seed, INT32_MAX); opts.memory_allocator = allocator; + opts.eviction_effort_cap = FLAGS_eviction_effort_cap; if (FLAGS_cache_type == "fixed_hyper_clock_cache" || FLAGS_cache_type == "hyper_clock_cache") { opts.estimated_entry_charge = FLAGS_value_bytes_estimate > 0 diff --git a/cache/clock_cache.cc b/cache/clock_cache.cc index e37c03fb5..de2e56186 100644 --- a/cache/clock_cache.cc +++ b/cache/clock_cache.cc @@ -15,6 +15,7 @@ #include <cassert> #include <cinttypes> #include <cstddef> +#include <cstdint> #include <exception> #include <functional> #include <numeric> @@ -94,7 +95,8 @@ inline void Unref(const ClockHandle& h, uint64_t count = 1) { (void)old_meta; } -inline bool ClockUpdate(ClockHandle& h, bool* purgeable = nullptr) { +inline bool ClockUpdate(ClockHandle& h, BaseClockTable::EvictionData* data, + bool* purgeable = nullptr) { uint64_t meta; if (purgeable) { assert(*purgeable == false); @@ -126,6 +128,7 @@ inline bool ClockUpdate(ClockHandle& h, bool* purgeable = nullptr) { (meta >> ClockHandle::kReleaseCounterShift) & ClockHandle::kCounterMask; if (acquire_count != release_count) { // Only clock update entries with no outstanding refs + data->seen_pinned_count++; return false; } if ((meta >> ClockHandle::kStateShift == ClockHandle::kStateVisible) && @@ -149,6 +152,8 @@ inline bool ClockUpdate(ClockHandle& h, bool* purgeable = nullptr) { << ClockHandle::kStateShift) | (meta & ClockHandle::kHitBitMask))) { // Took ownership. + data->freed_charge += h.GetTotalCharge(); + data->freed_count += 1; return true; } else { // Compare-exchange failing probably @@ -356,6 +361,18 @@ void ConstApplyToEntriesRange(const Func& func, const HandleImpl* begin, } } +constexpr uint32_t kStrictCapacityLimitBit = 1u << 31; + +uint32_t SanitizeEncodeEecAndScl(int eviction_effort_cap, + bool strict_capacit_limit) { + eviction_effort_cap = std::max(int{1}, eviction_effort_cap); + eviction_effort_cap = + std::min(static_cast<int>(~kStrictCapacityLimitBit), eviction_effort_cap); + uint32_t eec_and_scl = static_cast<uint32_t>(eviction_effort_cap); + eec_and_scl |= strict_capacit_limit ? kStrictCapacityLimitBit : 0; + return eec_and_scl; +} + } // namespace void ClockHandleBasicData::FreeData(MemoryAllocator* allocator) const { @@ -385,17 +402,20 @@ HandleImpl* BaseClockTable::StandaloneInsert( template <class Table> typename Table::HandleImpl* BaseClockTable::CreateStandalone( - ClockHandleBasicData& proto, size_t capacity, bool strict_capacity_limit, + ClockHandleBasicData& proto, size_t capacity, uint32_t eec_and_scl, bool allow_uncharged) { Table& derived = static_cast<Table&>(*this); typename Table::InsertState state; derived.StartInsert(state); const size_t total_charge = proto.GetTotalCharge(); - if (strict_capacity_limit) { + // NOTE: we can use eec_and_scl as eviction_effort_cap below because + // strict_capacity_limit=true is supposed to disable the limit on eviction + // effort, and a large value effectively does that. + if (eec_and_scl & kStrictCapacityLimitBit) { Status s = ChargeUsageMaybeEvictStrict<Table>( total_charge, capacity, - /*need_evict_for_occupancy=*/false, state); + /*need_evict_for_occupancy=*/false, eec_and_scl, state); if (!s.ok()) { if (allow_uncharged) { proto.total_charge = 0; @@ -407,7 +427,7 @@ typename Table::HandleImpl* BaseClockTable::CreateStandalone( // Case strict_capacity_limit == false bool success = ChargeUsageMaybeEvictNonStrict<Table>( total_charge, capacity, - /*need_evict_for_occupancy=*/false, state); + /*need_evict_for_occupancy=*/false, eec_and_scl, state); if (!success) { // Force the issue usage_.FetchAddRelaxed(total_charge); @@ -420,7 +440,7 @@ typename Table::HandleImpl* BaseClockTable::CreateStandalone( template <class Table> Status BaseClockTable::ChargeUsageMaybeEvictStrict( size_t total_charge, size_t capacity, bool need_evict_for_occupancy, - typename Table::InsertState& state) { + uint32_t eviction_effort_cap, typename Table::InsertState& state) { if (total_charge > capacity) { return Status::MemoryLimit( "Cache entry too large for a single cache shard: " + @@ -445,7 +465,8 @@ Status BaseClockTable::ChargeUsageMaybeEvictStrict( } if (request_evict_charge > 0) { EvictionData data; - static_cast<Table*>(this)->Evict(request_evict_charge, state, &data); + static_cast<Table*>(this)->Evict(request_evict_charge, state, &data, + eviction_effort_cap); occupancy_.FetchSub(data.freed_count); if (LIKELY(data.freed_charge > need_evict_charge)) { assert(data.freed_count > 0); @@ -475,7 +496,7 @@ Status BaseClockTable::ChargeUsageMaybeEvictStrict( template <class Table> inline bool BaseClockTable::ChargeUsageMaybeEvictNonStrict( size_t total_charge, size_t capacity, bool need_evict_for_occupancy, - typename Table::InsertState& state) { + uint32_t eviction_effort_cap, typename Table::InsertState& state) { // For simplicity, we consider that either the cache can accept the insert // with no evictions, or we must evict enough to make (at least) enough // space. It could lead to unnecessary failures or excessive evictions in @@ -511,7 +532,8 @@ inline bool BaseClockTable::ChargeUsageMaybeEvictNonStrict( } EvictionData data; if (need_evict_charge > 0) { - static_cast<Table*>(this)->Evict(need_evict_charge, state, &data); + static_cast<Table*>(this)->Evict(need_evict_charge, state, &data, + eviction_effort_cap); // Deal with potential occupancy deficit if (UNLIKELY(need_evict_for_occupancy) && data.freed_count == 0) { assert(data.freed_charge == 0); @@ -530,11 +552,7 @@ inline bool BaseClockTable::ChargeUsageMaybeEvictNonStrict( return true; } -void BaseClockTable::TrackAndReleaseEvictedEntry( - ClockHandle* h, BaseClockTable::EvictionData* data) { - data->freed_charge += h->GetTotalCharge(); - data->freed_count += 1; - +void BaseClockTable::TrackAndReleaseEvictedEntry(ClockHandle* h) { bool took_value_ownership = false; if (eviction_callback_) { // For key reconstructed from hash @@ -551,11 +569,20 @@ void BaseClockTable::TrackAndReleaseEvictedEntry( MarkEmpty(*h); } +bool IsEvictionEffortExceeded(const BaseClockTable::EvictionData& data, + uint32_t eviction_effort_cap) { + // Basically checks whether the ratio of useful effort to wasted effort is + // too low, with a start-up allowance for wasted effort before any useful + // effort. + return (data.freed_count + 1U) * uint64_t{eviction_effort_cap} <= + data.seen_pinned_count; +} + template <class Table> Status BaseClockTable::Insert(const ClockHandleBasicData& proto, typename Table::HandleImpl** handle, Cache::Priority priority, size_t capacity, - bool strict_capacity_limit) { + uint32_t eec_and_scl) { using HandleImpl = typename Table::HandleImpl; Table& derived = static_cast<Table&>(*this); @@ -573,9 +600,12 @@ Status BaseClockTable::Insert(const ClockHandleBasicData& proto, // strict_capacity_limit, but mostly pessimistic. bool use_standalone_insert = false; const size_t total_charge = proto.GetTotalCharge(); - if (strict_capacity_limit) { + // NOTE: we can use eec_and_scl as eviction_effort_cap below because + // strict_capacity_limit=true is supposed to disable the limit on eviction + // effort, and a large value effectively does that. + if (eec_and_scl & kStrictCapacityLimitBit) { Status s = ChargeUsageMaybeEvictStrict<Table>( - total_charge, capacity, need_evict_for_occupancy, state); + total_charge, capacity, need_evict_for_occupancy, eec_and_scl, state); if (!s.ok()) { // Revert occupancy occupancy_.FetchSubRelaxed(1); @@ -584,7 +614,7 @@ Status BaseClockTable::Insert(const ClockHandleBasicData& proto, } else { // Case strict_capacity_limit == false bool success = ChargeUsageMaybeEvictNonStrict<Table>( - total_charge, capacity, need_evict_for_occupancy, state); + total_charge, capacity, need_evict_for_occupancy, eec_and_scl, state); if (!success) { // Revert occupancy occupancy_.FetchSubRelaxed(1); @@ -688,8 +718,7 @@ void BaseClockTable::TEST_ReleaseNMinus1(ClockHandle* h, size_t n) { #endif FixedHyperClockTable::FixedHyperClockTable( - size_t capacity, bool /*strict_capacity_limit*/, - CacheMetadataChargePolicy metadata_charge_policy, + size_t capacity, CacheMetadataChargePolicy metadata_charge_policy, MemoryAllocator* allocator, const Cache::EvictionCallback* eviction_callback, const uint32_t* hash_seed, const Opts& opts) @@ -1084,7 +1113,8 @@ inline void FixedHyperClockTable::ReclaimEntryUsage(size_t total_charge) { } inline void FixedHyperClockTable::Evict(size_t requested_charge, InsertState&, - EvictionData* data) { + EvictionData* data, + uint32_t eviction_effort_cap) { // precondition assert(requested_charge > 0); @@ -1105,10 +1135,10 @@ inline void FixedHyperClockTable::Evict(size_t requested_charge, InsertState&, for (;;) { for (size_t i = 0; i < step_size; i++) { HandleImpl& h = array_[ModTableSize(Lower32of64(old_clock_pointer + i))]; - bool evicting = ClockUpdate(h); + bool evicting = ClockUpdate(h, data); if (evicting) { Rollback(h.hashed_key, &h); - TrackAndReleaseEvictedEntry(&h, data); + TrackAndReleaseEvictedEntry(&h); } } @@ -1119,6 +1149,10 @@ inline void FixedHyperClockTable::Evict(size_t requested_charge, InsertState&, if (old_clock_pointer >= max_clock_pointer) { return; } + if (IsEvictionEffortExceeded(*data, eviction_effort_cap)) { + eviction_effort_exceeded_count_.FetchAddRelaxed(1); + return; + } // Advance clock pointer (concurrently) old_clock_pointer = clock_pointer_.FetchAddRelaxed(step_size); @@ -1133,10 +1167,11 @@ ClockCacheShard<Table>::ClockCacheShard( const Cache::EvictionCallback* eviction_callback, const uint32_t* hash_seed, const typename Table::Opts& opts) : CacheShardBase(metadata_charge_policy), - table_(capacity, strict_capacity_limit, metadata_charge_policy, allocator, - eviction_callback, hash_seed, opts), + table_(capacity, metadata_charge_policy, allocator, eviction_callback, + hash_seed, opts), capacity_(capacity), - strict_capacity_limit_(strict_capacity_limit) { + eec_and_scl_(SanitizeEncodeEecAndScl(opts.eviction_effort_cap, + strict_capacity_limit)) { // Initial charge metadata should not exceed capacity assert(table_.GetUsage() <= capacity_.LoadRelaxed() || capacity_.LoadRelaxed() < sizeof(HandleImpl)); @@ -1212,7 +1247,11 @@ void ClockCacheShard<Table>::SetCapacity(size_t capacity) { template <class Table> void ClockCacheShard<Table>::SetStrictCapacityLimit( bool strict_capacity_limit) { - strict_capacity_limit_.StoreRelaxed(strict_capacity_limit); + if (strict_capacity_limit) { + eec_and_scl_.FetchOrRelaxed(kStrictCapacityLimitBit); + } else { + eec_and_scl_.FetchAndRelaxed(~kStrictCapacityLimitBit); + } // next Insert will take care of any necessary evictions } @@ -1234,7 +1273,7 @@ Status ClockCacheShard<Table>::Insert(const Slice& key, proto.total_charge = charge; return table_.template Insert<Table>(proto, handle, priority, capacity_.LoadRelaxed(), - strict_capacity_limit_.LoadRelaxed()); + eec_and_scl_.LoadRelaxed()); } template <class Table> @@ -1249,9 +1288,9 @@ typename Table::HandleImpl* ClockCacheShard<Table>::CreateStandalone( proto.value = obj; proto.helper = helper; proto.total_charge = charge; - return table_.template CreateStandalone<Table>( - proto, capacity_.LoadRelaxed(), strict_capacity_limit_.LoadRelaxed(), - allow_uncharged); + return table_.template CreateStandalone<Table>(proto, capacity_.LoadRelaxed(), + eec_and_scl_.LoadRelaxed(), + allow_uncharged); } template <class Table> @@ -1503,14 +1542,20 @@ void BaseHyperClockCache<Table>::ReportProblems( const std::shared_ptr<Logger>& info_log) const { if (info_log->GetInfoLogLevel() <= InfoLogLevel::DEBUG_LEVEL) { LoadVarianceStats slot_stats; + uint64_t eviction_effort_exceeded_count = 0; this->ForEachShard([&](const BaseHyperClockCache<Table>::Shard* shard) { size_t count = shard->GetTableAddressCount(); for (size_t i = 0; i < count; ++i) { slot_stats.Add(IsSlotOccupied(*shard->GetTable().HandlePtr(i))); } + eviction_effort_exceeded_count += + shard->GetTable().GetEvictionEffortExceededCount(); }); ROCKS_LOG_AT_LEVEL(info_log, InfoLogLevel::DEBUG_LEVEL, "Slot occupancy stats: %s", slot_stats.Report().c_str()); + ROCKS_LOG_AT_LEVEL(info_log, InfoLogLevel::DEBUG_LEVEL, + "Eviction effort exceeded: %" PRIu64, + eviction_effort_exceeded_count); } } @@ -1908,8 +1953,7 @@ class AutoHyperClockTable::ChainRewriteLock { }; AutoHyperClockTable::AutoHyperClockTable( - size_t capacity, bool /*strict_capacity_limit*/, - CacheMetadataChargePolicy metadata_charge_policy, + size_t capacity, CacheMetadataChargePolicy metadata_charge_policy, MemoryAllocator* allocator, const Cache::EvictionCallback* eviction_callback, const uint32_t* hash_seed, const Opts& opts) @@ -2590,7 +2634,8 @@ using ClockUpdateChainLockedOpData = template <class OpData> void AutoHyperClockTable::PurgeImplLocked(OpData* op_data, ChainRewriteLock& rewrite_lock, - size_t home) { + size_t home, + BaseClockTable::EvictionData* data) { constexpr bool kIsPurge = std::is_same_v<OpData, PurgeLockedOpData>; constexpr bool kIsClockUpdateChain = std::is_same_v<OpData, ClockUpdateChainLockedOpData>; @@ -2632,7 +2677,7 @@ void AutoHyperClockTable::PurgeImplLocked(OpData* op_data, assert(home == BottomNBits(h->hashed_key[1], home_shift)); if constexpr (kIsClockUpdateChain) { // Clock update and/or check for purgeable (under (de)construction) - if (ClockUpdate(*h, &purgeable)) { + if (ClockUpdate(*h, data, &purgeable)) { // Remember for finishing eviction op_data->push_back(h); // Entries for eviction become purgeable @@ -2642,6 +2687,7 @@ void AutoHyperClockTable::PurgeImplLocked(OpData* op_data, } } else { (void)op_data; + (void)data; purgeable = ((h->meta.Load() >> ClockHandle::kStateShift) & ClockHandle::kStateShareableBit) == 0; } @@ -2719,7 +2765,8 @@ using PurgeOpData = const UniqueId64x2; using ClockUpdateChainOpData = ClockUpdateChainLockedOpData; template <class OpData> -void AutoHyperClockTable::PurgeImpl(OpData* op_data, size_t home) { +void AutoHyperClockTable::PurgeImpl(OpData* op_data, size_t home, + BaseClockTable::EvictionData* data) { // Early efforts to make AutoHCC fully wait-free ran into too many problems // that needed obscure and potentially inefficient work-arounds to have a // chance at working. @@ -2800,9 +2847,9 @@ void AutoHyperClockTable::PurgeImpl(OpData* op_data, size_t home) { if (!rewrite_lock.IsEnd()) { if constexpr (kIsPurge) { PurgeLockedOpData* locked_op_data{}; - PurgeImplLocked(locked_op_data, rewrite_lock, home); + PurgeImplLocked(locked_op_data, rewrite_lock, home, data); } else { - PurgeImplLocked(op_data, rewrite_lock, home); + PurgeImplLocked(op_data, rewrite_lock, home, data); } } } @@ -3405,7 +3452,8 @@ void AutoHyperClockTable::EraseUnRefEntries() { } void AutoHyperClockTable::Evict(size_t requested_charge, InsertState& state, - EvictionData* data) { + EvictionData* data, + uint32_t eviction_effort_cap) { // precondition assert(requested_charge > 0); @@ -3463,12 +3511,12 @@ void AutoHyperClockTable::Evict(size_t requested_charge, InsertState& state, if (home >= used_length) { break; } - PurgeImpl(&to_finish_eviction, home); + PurgeImpl(&to_finish_eviction, home, data); } } for (HandleImpl* h : to_finish_eviction) { - TrackAndReleaseEvictedEntry(h, data); + TrackAndReleaseEvictedEntry(h); // NOTE: setting likely_empty_slot here can cause us to reduce the // portion of "at home" entries, probably because an evicted entry // is more likely to come back than a random new entry and would be @@ -3496,6 +3544,11 @@ void AutoHyperClockTable::Evict(size_t requested_charge, InsertState& state, if (old_clock_pointer + step_size >= max_clock_pointer) { return; } + + if (IsEvictionEffortExceeded(*data, eviction_effort_cap)) { + eviction_effort_exceeded_count_.FetchAddRelaxed(1); + return; + } } } diff --git a/cache/clock_cache.h b/cache/clock_cache.h index 3086e7e97..54d656021 100644 --- a/cache/clock_cache.h +++ b/cache/clock_cache.h @@ -11,6 +11,7 @@ #include <array> #include <atomic> +#include <climits> #include <cstddef> #include <cstdint> #include <memory> @@ -374,6 +375,14 @@ struct ClockHandle : public ClockHandleBasicData { class BaseClockTable { public: + struct BaseOpts { + explicit BaseOpts(int _eviction_effort_cap) + : eviction_effort_cap(_eviction_effort_cap) {} + explicit BaseOpts(const HyperClockCacheOptions& opts) + : BaseOpts(opts.eviction_effort_cap) {} + int eviction_effort_cap; + }; + BaseClockTable(CacheMetadataChargePolicy metadata_charge_policy, MemoryAllocator* allocator, const Cache::EvictionCallback* eviction_callback, @@ -386,13 +395,13 @@ class BaseClockTable { template <class Table> typename Table::HandleImpl* CreateStandalone(ClockHandleBasicData& proto, size_t capacity, - bool strict_capacity_limit, + uint32_t eec_and_scl, bool allow_uncharged); template <class Table> Status Insert(const ClockHandleBasicData& proto, typename Table::HandleImpl** handle, Cache::Priority priority, - size_t capacity, bool strict_capacity_limit); + size_t capacity, uint32_t eec_and_scl); void Ref(ClockHandle& handle); @@ -406,12 +415,17 @@ class BaseClockTable { uint64_t GetYieldCount() const { return yield_count_.LoadRelaxed(); } + uint64_t GetEvictionEffortExceededCount() const { + return eviction_effort_exceeded_count_.LoadRelaxed(); + } + struct EvictionData { size_t freed_charge = 0; size_t freed_count = 0; + size_t seen_pinned_count = 0; }; - void TrackAndReleaseEvictedEntry(ClockHandle* h, EvictionData* data); + void TrackAndReleaseEvictedEntry(ClockHandle* h); #ifndef NDEBUG // Acquire N references @@ -436,6 +450,7 @@ class BaseClockTable { template <class Table> Status ChargeUsageMaybeEvictStrict(size_t total_charge, size_t capacity, bool need_evict_for_occupancy, + uint32_t eviction_effort_cap, typename Table::InsertState& state); // Helper for updating `usage_` for new entry with given `total_charge` @@ -449,6 +464,7 @@ class BaseClockTable { template <class Table> bool ChargeUsageMaybeEvictNonStrict(size_t total_charge, size_t capacity, bool need_evict_for_occupancy, + uint32_t eviction_effort_cap, typename Table::InsertState& state); protected: // data @@ -461,9 +477,15 @@ class BaseClockTable { RelaxedAtomic<uint64_t> clock_pointer_{}; // Counter for number of times we yield to wait on another thread. + // It is normal for this to occur rarely in normal operation. // (Relaxed: a simple stat counter.) RelaxedAtomic<uint64_t> yield_count_{}; + // Counter for number of times eviction effort cap is exceeded. + // It is normal for this to occur rarely in normal operation. + // (Relaxed: a simple stat counter.) + RelaxedAtomic<uint64_t> eviction_effort_exceeded_count_{}; + // TODO: is this separation needed if we don't do background evictions? ALIGN_AS(CACHE_LINE_SIZE) // Number of elements in the table. @@ -517,17 +539,19 @@ class FixedHyperClockTable : public BaseClockTable { inline void SetStandalone() { standalone = true; } }; // struct HandleImpl - struct Opts { - explicit Opts(size_t _estimated_value_size) - : estimated_value_size(_estimated_value_size) {} - explicit Opts(const HyperClockCacheOptions& opts) { + struct Opts : public BaseOpts { + explicit Opts(size_t _estimated_value_size, int _eviction_effort_cap) + : BaseOpts(_eviction_effort_cap), + estimated_value_size(_estimated_value_size) {} + explicit Opts(const HyperClockCacheOptions& opts) + : BaseOpts(opts.eviction_effort_cap) { assert(opts.estimated_entry_charge > 0); estimated_value_size = opts.estimated_entry_charge; } size_t estimated_value_size; }; - FixedHyperClockTable(size_t capacity, bool strict_capacity_limit, + FixedHyperClockTable(size_t capacity, CacheMetadataChargePolicy metadata_charge_policy, MemoryAllocator* allocator, const Cache::EvictionCallback* eviction_callback, @@ -549,7 +573,8 @@ class FixedHyperClockTable : public BaseClockTable { // Runs the clock eviction algorithm trying to reclaim at least // requested_charge. Returns how much is evicted, which could be less // if it appears impossible to evict the requested amount without blocking. - void Evict(size_t requested_charge, InsertState& state, EvictionData* data); + void Evict(size_t requested_charge, InsertState& state, EvictionData* data, + uint32_t eviction_effort_cap); HandleImpl* Lookup(const UniqueId64x2& hashed_key); @@ -803,18 +828,20 @@ class AutoHyperClockTable : public BaseClockTable { } }; // struct HandleImpl - struct Opts { - explicit Opts(size_t _min_avg_value_size) - : min_avg_value_size(_min_avg_value_size) {} + struct Opts : public BaseOpts { + explicit Opts(size_t _min_avg_value_size, int _eviction_effort_cap) + : BaseOpts(_eviction_effort_cap), + min_avg_value_size(_min_avg_value_size) {} - explicit Opts(const HyperClockCacheOptions& opts) { + explicit Opts(const HyperClockCacheOptions& opts) + : BaseOpts(opts.eviction_effort_cap) { assert(opts.estimated_entry_charge == 0); min_avg_value_size = opts.min_avg_entry_charge; } size_t min_avg_value_size; }; - AutoHyperClockTable(size_t capacity, bool strict_capacity_limit, + AutoHyperClockTable(size_t capacity, CacheMetadataChargePolicy metadata_charge_policy, MemoryAllocator* allocator, const Cache::EvictionCallback* eviction_callback, @@ -841,7 +868,8 @@ class AutoHyperClockTable : public BaseClockTable { // Runs the clock eviction algorithm trying to reclaim at least // requested_charge. Returns how much is evicted, which could be less // if it appears impossible to evict the requested amount without blocking. - void Evict(size_t requested_charge, InsertState& state, EvictionData* data); + void Evict(size_t requested_charge, InsertState& state, EvictionData* data, + uint32_t eviction_effort_cap); HandleImpl* Lookup(const UniqueId64x2& hashed_key); @@ -906,7 +934,8 @@ class AutoHyperClockTable : public BaseClockTable { // with proper handling to ensure all existing data is seen even in the // presence of concurrent insertions, etc. (See implementation.) template <class OpData> - void PurgeImpl(OpData* op_data, size_t home = SIZE_MAX); + void PurgeImpl(OpData* op_data, size_t home = SIZE_MAX, + EvictionData* data = nullptr); // An RAII wrapper for locking a chain of entries for removals. See // implementation. @@ -916,7 +945,7 @@ class AutoHyperClockTable : public BaseClockTable { // implementation. template <class OpData> void PurgeImplLocked(OpData* op_data, ChainRewriteLock& rewrite_lock, - size_t home); + size_t home, EvictionData* data); // Update length_info_ as much as possible without waiting, given a known // usable (ready for inserts and lookups) grow_home. (Previous grow_homes @@ -1078,9 +1107,10 @@ class ALIGN_AS(CACHE_LINE_SIZE) ClockCacheShard final : public CacheShardBase { // (Relaxed: eventual consistency/update is OK) RelaxedAtomic<size_t> capacity_; - // Whether to reject insertion if cache reaches its full capacity. + // Encodes eviction_effort_cap (bottom 31 bits) and strict_capacity_limit + // (top bit). See HyperClockCacheOptions::eviction_effort_cap etc. // (Relaxed: eventual consistency/update is OK) - RelaxedAtomic<bool> strict_capacity_limit_; + RelaxedAtomic<uint32_t> eec_and_scl_; }; // class ClockCacheShard template <class Table> diff --git a/cache/compressed_secondary_cache_test.cc b/cache/compressed_secondary_cache_test.cc index d72680b84..79f40868a 100644 --- a/cache/compressed_secondary_cache_test.cc +++ b/cache/compressed_secondary_cache_test.cc @@ -992,6 +992,8 @@ class CompressedSecCacheTestWithTiered /*_capacity=*/0, /*_estimated_entry_charge=*/256 << 10, /*_num_shard_bits=*/0); + // eviction_effort_cap setting simply to avoid churn in existing test + hcc_opts.eviction_effort_cap = 100; TieredCacheOptions opts; lru_opts.capacity = 0; lru_opts.num_shard_bits = 0; diff --git a/cache/lru_cache_test.cc b/cache/lru_cache_test.cc index 27fd5cc85..91b1d02c1 100644 --- a/cache/lru_cache_test.cc +++ b/cache/lru_cache_test.cc @@ -389,12 +389,13 @@ class ClockCacheTest : public testing::Test { } } - void NewShard(size_t capacity, bool strict_capacity_limit = true) { + void NewShard(size_t capacity, bool strict_capacity_limit = true, + int eviction_effort_cap = 30) { DeleteShard(); shard_ = reinterpret_cast<Shard*>(port::cacheline_aligned_alloc(sizeof(Shard))); - TableOpts opts{1 /*value_size*/}; + TableOpts opts{1 /*value_size*/, eviction_effort_cap}; new (shard_) Shard(capacity, strict_capacity_limit, kDontChargeCacheMetadata, /*allocator*/ nullptr, &eviction_callback_, &hash_seed_, opts); @@ -445,12 +446,20 @@ class ClockCacheTest : public testing::Test { return Slice(reinterpret_cast<const char*>(&hashed_key), 16U); } + // A bad hash function for testing / stressing collision handling static inline UniqueId64x2 TestHashedKey(char key) { // For testing hash near-collision behavior, put the variance in // hashed_key in bits that are unlikely to be used as hash bits. return {(static_cast<uint64_t>(key) << 56) + 1234U, 5678U}; } + // A reasonable hash function, for testing "typical behavior" etc. + template <typename T> + static inline UniqueId64x2 CheapHash(T i) { + return {static_cast<uint64_t>(i) * uint64_t{0x85EBCA77C2B2AE63}, + static_cast<uint64_t>(i) * uint64_t{0xC2B2AE3D27D4EB4F}}; + } + Shard* shard_ = nullptr; private: @@ -683,6 +692,53 @@ TYPED_TEST(ClockCacheTest, ClockEvictionTest) { } } +TYPED_TEST(ClockCacheTest, ClockEvictionEffortCapTest) { + using HandleImpl = typename ClockCacheTest<TypeParam>::Shard::HandleImpl; + for (bool strict_capacity_limit : {true, false}) { + SCOPED_TRACE("strict_capacity_limit = " + + std::to_string(strict_capacity_limit)); + for (int eec : {-42, 0, 1, 10, 100, 1000}) { + SCOPED_TRACE("eviction_effort_cap = " + std::to_string(eec)); + constexpr size_t kCapacity = 1000; + // Start with much larger capacity to ensure that we can go way over + // capacity without reaching table occupancy limit. + this->NewShard(3 * kCapacity, strict_capacity_limit, eec); + auto& shard = *this->shard_; + shard.SetCapacity(kCapacity); + + // Nearly fill the cache with pinned entries, then add a bunch of + // non-pinned entries. eviction_effort_cap should affect how many + // evictable entries are present beyond the cache capacity, despite + // being evictable. + constexpr size_t kCount = kCapacity - 1; + std::unique_ptr<HandleImpl* []> ha { new HandleImpl* [kCount] {} }; + for (size_t i = 0; i < 2 * kCount; ++i) { + UniqueId64x2 hkey = this->CheapHash(i); + ASSERT_OK(shard.Insert( + this->TestKey(hkey), hkey, nullptr /*value*/, &kNoopCacheItemHelper, + 1 /*charge*/, i < kCount ? &ha[i] : nullptr, Cache::Priority::LOW)); + } + + if (strict_capacity_limit) { + // If strict_capacity_limit is enabled, the cache will never exceed its + // capacity + EXPECT_EQ(shard.GetOccupancyCount(), kCapacity); + } else { + // Rough inverse relationship between cap and possible memory + // explosion, which shows up as increased table occupancy count. + int effective_eec = std::max(int{1}, eec) + 1; + EXPECT_NEAR(shard.GetOccupancyCount() * 1.0, + kCount * (1 + 1.4 / effective_eec), + kCount * (0.6 / effective_eec) + 1.0); + } + + for (size_t i = 0; i < kCount; ++i) { + shard.Release(ha[i]); + } + } + } +} + namespace { struct DeleteCounter { int deleted = 0; |