summaryrefslogtreecommitdiff
path: root/cache
diff options
context:
space:
mode:
authorPeter Dillinger <peterd@meta.com>2023-12-14 22:13:32 -0800
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>2023-12-14 22:13:32 -0800
commit88bc91f3cc2b492b8a45ba2c49650f527df97ad8 (patch)
treefe23f5a35a716feb95801e313d7d986634b153f6 /cache
parentcd577f605948894b51fbaab39d1df03a04dfd70f (diff)
Cap eviction effort (CPU under stress) in HyperClockCache (#12141)
Summary: HyperClockCache is intended to mitigate performance problems under stress conditions (as well as optimizing average-case parallel performance). In LRUCache, the biggest such problem is lock contention when one or a small number of cache entries becomes particularly hot. Regardless of cache sharding, accesses to any particular cache entry are linearized against a single mutex, which is held while each access updates the LRU list. All HCC variants are fully lock/wait-free for accessing blocks already in the cache, which fully mitigates this contention problem. However, HCC (and CLOCK in general) can exhibit extremely degraded performance under a different stress condition: when no (or almost no) entries in a cache shard are evictable (they are pinned). Unlike LRU which can find any evictable entries immediately (at the cost of more coordination / synchronization on each access), CLOCK has to search for evictable entries. Under the right conditions (almost exclusively MB-scale caches not GB-scale), the CPU cost of each cache miss could fall off a cliff and bog down the whole system. To effectively mitigate this problem (IMHO), I'm introducing a new default behavior and tuning parameter for HCC, `eviction_effort_cap`. See the comments on the new config parameter in the public API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12141 Test Plan: unit test included ## Performance test We can use cache_bench to validate no regression (CPU and memory) in normal operation, and to measure change in behavior when cache is almost entirely pinned. (TODO: I'm not sure why I had to get the pinned ratio parameter well over 1.0 to see truly bad performance, but the behavior is there.) Build with `make DEBUG_LEVEL=0 USE_CLANG=1 PORTABLE=0 cache_bench`. We also set MALLOC_CONF="narenas:1" for all these runs to essentially remove jemalloc variances from the results, so that the max RSS given by /usr/bin/time is essentially ideal (assuming the allocator minimizes fragmentation and other memory overheads well). Base command reproducing bad behavior: ``` ./cache_bench -cache_type=auto_hyper_clock_cache -threads=12 -histograms=0 -pinned_ratio=1.7 ``` ``` Before, LRU (alternate baseline not exhibiting bad behavior): Rough parallel ops/sec = 2290997 1088060 maxresident Before, AutoHCC (bad behavior): Rough parallel ops/sec = 141011 <- Yes, more than 10x slower 1083932 maxresident ``` Now let us sample a range of values in the solution space: ``` After, AutoHCC, eviction_effort_cap = 1: Rough parallel ops/sec = 3212586 2402216 maxresident After, AutoHCC, eviction_effort_cap = 10: Rough parallel ops/sec = 2371639 1248884 maxresident After, AutoHCC, eviction_effort_cap = 30: Rough parallel ops/sec = 1981092 1131596 maxresident After, AutoHCC, eviction_effort_cap = 100: Rough parallel ops/sec = 1446188 1090976 maxresident After, AutoHCC, eviction_effort_cap = 1000: Rough parallel ops/sec = 549568 1084064 maxresident ``` I looks like `cap=30` is a sweet spot balancing acceptable CPU and memory overheads, so is chosen as the default. ``` Change to -pinned_ratio=0.85 Before, LRU: Rough parallel ops/sec = 2108373 1078232 maxresident Before, AutoHCC, averaged over ~20 runs: Rough parallel ops/sec = 2164910 1077312 maxresident After, AutoHCC, eviction_effort_cap = 30, averaged over ~20 runs: Rough parallel ops/sec = 2145542 1077216 maxresident ``` The slight CPU improvement above is consistent with the cap, with no measurable memory overhead under moderate stress. ``` Change to -pinned_ratio=0.25 (low stress) Before, AutoHCC, averaged over ~20 runs: Rough parallel ops/sec = 2221149 1076540 maxresident After, AutoHCC, eviction_effort_cap = 30, averaged over ~20 runs: Rough parallel ops/sec = 2224521 1076664 maxresident ``` No measurable difference under normal circumstances. Some tests repeated with FixedHCC, with similar results. Reviewed By: anand1976 Differential Revision: D52174755 Pulled By: pdillinger fbshipit-source-id: d278108031b1220c1fa4c89c5a9d34b7cf4ef1b8
Diffstat (limited to 'cache')
-rw-r--r--cache/cache_bench_tool.cc5
-rw-r--r--cache/clock_cache.cc135
-rw-r--r--cache/clock_cache.h68
-rw-r--r--cache/compressed_secondary_cache_test.cc2
-rw-r--r--cache/lru_cache_test.cc60
5 files changed, 208 insertions, 62 deletions
diff --git a/cache/cache_bench_tool.cc b/cache/cache_bench_tool.cc
index 89945abf7..07cd1a1f6 100644
--- a/cache/cache_bench_tool.cc
+++ b/cache/cache_bench_tool.cc
@@ -48,6 +48,10 @@ DEFINE_uint64(cache_size, 1 * GiB,
"Number of bytes to use as a cache of uncompressed data.");
DEFINE_int32(num_shard_bits, -1,
"ShardedCacheOptions::shard_bits. Default = auto");
+DEFINE_int32(
+ eviction_effort_cap,
+ ROCKSDB_NAMESPACE::HyperClockCacheOptions(1, 1).eviction_effort_cap,
+ "HyperClockCacheOptions::eviction_effort_cap");
DEFINE_double(resident_ratio, 0.25,
"Ratio of keys fitting in cache to keyspace.");
@@ -391,6 +395,7 @@ class CacheBench {
FLAGS_cache_size, /*estimated_entry_charge=*/0, FLAGS_num_shard_bits);
opts.hash_seed = BitwiseAnd(FLAGS_seed, INT32_MAX);
opts.memory_allocator = allocator;
+ opts.eviction_effort_cap = FLAGS_eviction_effort_cap;
if (FLAGS_cache_type == "fixed_hyper_clock_cache" ||
FLAGS_cache_type == "hyper_clock_cache") {
opts.estimated_entry_charge = FLAGS_value_bytes_estimate > 0
diff --git a/cache/clock_cache.cc b/cache/clock_cache.cc
index e37c03fb5..de2e56186 100644
--- a/cache/clock_cache.cc
+++ b/cache/clock_cache.cc
@@ -15,6 +15,7 @@
#include <cassert>
#include <cinttypes>
#include <cstddef>
+#include <cstdint>
#include <exception>
#include <functional>
#include <numeric>
@@ -94,7 +95,8 @@ inline void Unref(const ClockHandle& h, uint64_t count = 1) {
(void)old_meta;
}
-inline bool ClockUpdate(ClockHandle& h, bool* purgeable = nullptr) {
+inline bool ClockUpdate(ClockHandle& h, BaseClockTable::EvictionData* data,
+ bool* purgeable = nullptr) {
uint64_t meta;
if (purgeable) {
assert(*purgeable == false);
@@ -126,6 +128,7 @@ inline bool ClockUpdate(ClockHandle& h, bool* purgeable = nullptr) {
(meta >> ClockHandle::kReleaseCounterShift) & ClockHandle::kCounterMask;
if (acquire_count != release_count) {
// Only clock update entries with no outstanding refs
+ data->seen_pinned_count++;
return false;
}
if ((meta >> ClockHandle::kStateShift == ClockHandle::kStateVisible) &&
@@ -149,6 +152,8 @@ inline bool ClockUpdate(ClockHandle& h, bool* purgeable = nullptr) {
<< ClockHandle::kStateShift) |
(meta & ClockHandle::kHitBitMask))) {
// Took ownership.
+ data->freed_charge += h.GetTotalCharge();
+ data->freed_count += 1;
return true;
} else {
// Compare-exchange failing probably
@@ -356,6 +361,18 @@ void ConstApplyToEntriesRange(const Func& func, const HandleImpl* begin,
}
}
+constexpr uint32_t kStrictCapacityLimitBit = 1u << 31;
+
+uint32_t SanitizeEncodeEecAndScl(int eviction_effort_cap,
+ bool strict_capacit_limit) {
+ eviction_effort_cap = std::max(int{1}, eviction_effort_cap);
+ eviction_effort_cap =
+ std::min(static_cast<int>(~kStrictCapacityLimitBit), eviction_effort_cap);
+ uint32_t eec_and_scl = static_cast<uint32_t>(eviction_effort_cap);
+ eec_and_scl |= strict_capacit_limit ? kStrictCapacityLimitBit : 0;
+ return eec_and_scl;
+}
+
} // namespace
void ClockHandleBasicData::FreeData(MemoryAllocator* allocator) const {
@@ -385,17 +402,20 @@ HandleImpl* BaseClockTable::StandaloneInsert(
template <class Table>
typename Table::HandleImpl* BaseClockTable::CreateStandalone(
- ClockHandleBasicData& proto, size_t capacity, bool strict_capacity_limit,
+ ClockHandleBasicData& proto, size_t capacity, uint32_t eec_and_scl,
bool allow_uncharged) {
Table& derived = static_cast<Table&>(*this);
typename Table::InsertState state;
derived.StartInsert(state);
const size_t total_charge = proto.GetTotalCharge();
- if (strict_capacity_limit) {
+ // NOTE: we can use eec_and_scl as eviction_effort_cap below because
+ // strict_capacity_limit=true is supposed to disable the limit on eviction
+ // effort, and a large value effectively does that.
+ if (eec_and_scl & kStrictCapacityLimitBit) {
Status s = ChargeUsageMaybeEvictStrict<Table>(
total_charge, capacity,
- /*need_evict_for_occupancy=*/false, state);
+ /*need_evict_for_occupancy=*/false, eec_and_scl, state);
if (!s.ok()) {
if (allow_uncharged) {
proto.total_charge = 0;
@@ -407,7 +427,7 @@ typename Table::HandleImpl* BaseClockTable::CreateStandalone(
// Case strict_capacity_limit == false
bool success = ChargeUsageMaybeEvictNonStrict<Table>(
total_charge, capacity,
- /*need_evict_for_occupancy=*/false, state);
+ /*need_evict_for_occupancy=*/false, eec_and_scl, state);
if (!success) {
// Force the issue
usage_.FetchAddRelaxed(total_charge);
@@ -420,7 +440,7 @@ typename Table::HandleImpl* BaseClockTable::CreateStandalone(
template <class Table>
Status BaseClockTable::ChargeUsageMaybeEvictStrict(
size_t total_charge, size_t capacity, bool need_evict_for_occupancy,
- typename Table::InsertState& state) {
+ uint32_t eviction_effort_cap, typename Table::InsertState& state) {
if (total_charge > capacity) {
return Status::MemoryLimit(
"Cache entry too large for a single cache shard: " +
@@ -445,7 +465,8 @@ Status BaseClockTable::ChargeUsageMaybeEvictStrict(
}
if (request_evict_charge > 0) {
EvictionData data;
- static_cast<Table*>(this)->Evict(request_evict_charge, state, &data);
+ static_cast<Table*>(this)->Evict(request_evict_charge, state, &data,
+ eviction_effort_cap);
occupancy_.FetchSub(data.freed_count);
if (LIKELY(data.freed_charge > need_evict_charge)) {
assert(data.freed_count > 0);
@@ -475,7 +496,7 @@ Status BaseClockTable::ChargeUsageMaybeEvictStrict(
template <class Table>
inline bool BaseClockTable::ChargeUsageMaybeEvictNonStrict(
size_t total_charge, size_t capacity, bool need_evict_for_occupancy,
- typename Table::InsertState& state) {
+ uint32_t eviction_effort_cap, typename Table::InsertState& state) {
// For simplicity, we consider that either the cache can accept the insert
// with no evictions, or we must evict enough to make (at least) enough
// space. It could lead to unnecessary failures or excessive evictions in
@@ -511,7 +532,8 @@ inline bool BaseClockTable::ChargeUsageMaybeEvictNonStrict(
}
EvictionData data;
if (need_evict_charge > 0) {
- static_cast<Table*>(this)->Evict(need_evict_charge, state, &data);
+ static_cast<Table*>(this)->Evict(need_evict_charge, state, &data,
+ eviction_effort_cap);
// Deal with potential occupancy deficit
if (UNLIKELY(need_evict_for_occupancy) && data.freed_count == 0) {
assert(data.freed_charge == 0);
@@ -530,11 +552,7 @@ inline bool BaseClockTable::ChargeUsageMaybeEvictNonStrict(
return true;
}
-void BaseClockTable::TrackAndReleaseEvictedEntry(
- ClockHandle* h, BaseClockTable::EvictionData* data) {
- data->freed_charge += h->GetTotalCharge();
- data->freed_count += 1;
-
+void BaseClockTable::TrackAndReleaseEvictedEntry(ClockHandle* h) {
bool took_value_ownership = false;
if (eviction_callback_) {
// For key reconstructed from hash
@@ -551,11 +569,20 @@ void BaseClockTable::TrackAndReleaseEvictedEntry(
MarkEmpty(*h);
}
+bool IsEvictionEffortExceeded(const BaseClockTable::EvictionData& data,
+ uint32_t eviction_effort_cap) {
+ // Basically checks whether the ratio of useful effort to wasted effort is
+ // too low, with a start-up allowance for wasted effort before any useful
+ // effort.
+ return (data.freed_count + 1U) * uint64_t{eviction_effort_cap} <=
+ data.seen_pinned_count;
+}
+
template <class Table>
Status BaseClockTable::Insert(const ClockHandleBasicData& proto,
typename Table::HandleImpl** handle,
Cache::Priority priority, size_t capacity,
- bool strict_capacity_limit) {
+ uint32_t eec_and_scl) {
using HandleImpl = typename Table::HandleImpl;
Table& derived = static_cast<Table&>(*this);
@@ -573,9 +600,12 @@ Status BaseClockTable::Insert(const ClockHandleBasicData& proto,
// strict_capacity_limit, but mostly pessimistic.
bool use_standalone_insert = false;
const size_t total_charge = proto.GetTotalCharge();
- if (strict_capacity_limit) {
+ // NOTE: we can use eec_and_scl as eviction_effort_cap below because
+ // strict_capacity_limit=true is supposed to disable the limit on eviction
+ // effort, and a large value effectively does that.
+ if (eec_and_scl & kStrictCapacityLimitBit) {
Status s = ChargeUsageMaybeEvictStrict<Table>(
- total_charge, capacity, need_evict_for_occupancy, state);
+ total_charge, capacity, need_evict_for_occupancy, eec_and_scl, state);
if (!s.ok()) {
// Revert occupancy
occupancy_.FetchSubRelaxed(1);
@@ -584,7 +614,7 @@ Status BaseClockTable::Insert(const ClockHandleBasicData& proto,
} else {
// Case strict_capacity_limit == false
bool success = ChargeUsageMaybeEvictNonStrict<Table>(
- total_charge, capacity, need_evict_for_occupancy, state);
+ total_charge, capacity, need_evict_for_occupancy, eec_and_scl, state);
if (!success) {
// Revert occupancy
occupancy_.FetchSubRelaxed(1);
@@ -688,8 +718,7 @@ void BaseClockTable::TEST_ReleaseNMinus1(ClockHandle* h, size_t n) {
#endif
FixedHyperClockTable::FixedHyperClockTable(
- size_t capacity, bool /*strict_capacity_limit*/,
- CacheMetadataChargePolicy metadata_charge_policy,
+ size_t capacity, CacheMetadataChargePolicy metadata_charge_policy,
MemoryAllocator* allocator,
const Cache::EvictionCallback* eviction_callback, const uint32_t* hash_seed,
const Opts& opts)
@@ -1084,7 +1113,8 @@ inline void FixedHyperClockTable::ReclaimEntryUsage(size_t total_charge) {
}
inline void FixedHyperClockTable::Evict(size_t requested_charge, InsertState&,
- EvictionData* data) {
+ EvictionData* data,
+ uint32_t eviction_effort_cap) {
// precondition
assert(requested_charge > 0);
@@ -1105,10 +1135,10 @@ inline void FixedHyperClockTable::Evict(size_t requested_charge, InsertState&,
for (;;) {
for (size_t i = 0; i < step_size; i++) {
HandleImpl& h = array_[ModTableSize(Lower32of64(old_clock_pointer + i))];
- bool evicting = ClockUpdate(h);
+ bool evicting = ClockUpdate(h, data);
if (evicting) {
Rollback(h.hashed_key, &h);
- TrackAndReleaseEvictedEntry(&h, data);
+ TrackAndReleaseEvictedEntry(&h);
}
}
@@ -1119,6 +1149,10 @@ inline void FixedHyperClockTable::Evict(size_t requested_charge, InsertState&,
if (old_clock_pointer >= max_clock_pointer) {
return;
}
+ if (IsEvictionEffortExceeded(*data, eviction_effort_cap)) {
+ eviction_effort_exceeded_count_.FetchAddRelaxed(1);
+ return;
+ }
// Advance clock pointer (concurrently)
old_clock_pointer = clock_pointer_.FetchAddRelaxed(step_size);
@@ -1133,10 +1167,11 @@ ClockCacheShard<Table>::ClockCacheShard(
const Cache::EvictionCallback* eviction_callback, const uint32_t* hash_seed,
const typename Table::Opts& opts)
: CacheShardBase(metadata_charge_policy),
- table_(capacity, strict_capacity_limit, metadata_charge_policy, allocator,
- eviction_callback, hash_seed, opts),
+ table_(capacity, metadata_charge_policy, allocator, eviction_callback,
+ hash_seed, opts),
capacity_(capacity),
- strict_capacity_limit_(strict_capacity_limit) {
+ eec_and_scl_(SanitizeEncodeEecAndScl(opts.eviction_effort_cap,
+ strict_capacity_limit)) {
// Initial charge metadata should not exceed capacity
assert(table_.GetUsage() <= capacity_.LoadRelaxed() ||
capacity_.LoadRelaxed() < sizeof(HandleImpl));
@@ -1212,7 +1247,11 @@ void ClockCacheShard<Table>::SetCapacity(size_t capacity) {
template <class Table>
void ClockCacheShard<Table>::SetStrictCapacityLimit(
bool strict_capacity_limit) {
- strict_capacity_limit_.StoreRelaxed(strict_capacity_limit);
+ if (strict_capacity_limit) {
+ eec_and_scl_.FetchOrRelaxed(kStrictCapacityLimitBit);
+ } else {
+ eec_and_scl_.FetchAndRelaxed(~kStrictCapacityLimitBit);
+ }
// next Insert will take care of any necessary evictions
}
@@ -1234,7 +1273,7 @@ Status ClockCacheShard<Table>::Insert(const Slice& key,
proto.total_charge = charge;
return table_.template Insert<Table>(proto, handle, priority,
capacity_.LoadRelaxed(),
- strict_capacity_limit_.LoadRelaxed());
+ eec_and_scl_.LoadRelaxed());
}
template <class Table>
@@ -1249,9 +1288,9 @@ typename Table::HandleImpl* ClockCacheShard<Table>::CreateStandalone(
proto.value = obj;
proto.helper = helper;
proto.total_charge = charge;
- return table_.template CreateStandalone<Table>(
- proto, capacity_.LoadRelaxed(), strict_capacity_limit_.LoadRelaxed(),
- allow_uncharged);
+ return table_.template CreateStandalone<Table>(proto, capacity_.LoadRelaxed(),
+ eec_and_scl_.LoadRelaxed(),
+ allow_uncharged);
}
template <class Table>
@@ -1503,14 +1542,20 @@ void BaseHyperClockCache<Table>::ReportProblems(
const std::shared_ptr<Logger>& info_log) const {
if (info_log->GetInfoLogLevel() <= InfoLogLevel::DEBUG_LEVEL) {
LoadVarianceStats slot_stats;
+ uint64_t eviction_effort_exceeded_count = 0;
this->ForEachShard([&](const BaseHyperClockCache<Table>::Shard* shard) {
size_t count = shard->GetTableAddressCount();
for (size_t i = 0; i < count; ++i) {
slot_stats.Add(IsSlotOccupied(*shard->GetTable().HandlePtr(i)));
}
+ eviction_effort_exceeded_count +=
+ shard->GetTable().GetEvictionEffortExceededCount();
});
ROCKS_LOG_AT_LEVEL(info_log, InfoLogLevel::DEBUG_LEVEL,
"Slot occupancy stats: %s", slot_stats.Report().c_str());
+ ROCKS_LOG_AT_LEVEL(info_log, InfoLogLevel::DEBUG_LEVEL,
+ "Eviction effort exceeded: %" PRIu64,
+ eviction_effort_exceeded_count);
}
}
@@ -1908,8 +1953,7 @@ class AutoHyperClockTable::ChainRewriteLock {
};
AutoHyperClockTable::AutoHyperClockTable(
- size_t capacity, bool /*strict_capacity_limit*/,
- CacheMetadataChargePolicy metadata_charge_policy,
+ size_t capacity, CacheMetadataChargePolicy metadata_charge_policy,
MemoryAllocator* allocator,
const Cache::EvictionCallback* eviction_callback, const uint32_t* hash_seed,
const Opts& opts)
@@ -2590,7 +2634,8 @@ using ClockUpdateChainLockedOpData =
template <class OpData>
void AutoHyperClockTable::PurgeImplLocked(OpData* op_data,
ChainRewriteLock& rewrite_lock,
- size_t home) {
+ size_t home,
+ BaseClockTable::EvictionData* data) {
constexpr bool kIsPurge = std::is_same_v<OpData, PurgeLockedOpData>;
constexpr bool kIsClockUpdateChain =
std::is_same_v<OpData, ClockUpdateChainLockedOpData>;
@@ -2632,7 +2677,7 @@ void AutoHyperClockTable::PurgeImplLocked(OpData* op_data,
assert(home == BottomNBits(h->hashed_key[1], home_shift));
if constexpr (kIsClockUpdateChain) {
// Clock update and/or check for purgeable (under (de)construction)
- if (ClockUpdate(*h, &purgeable)) {
+ if (ClockUpdate(*h, data, &purgeable)) {
// Remember for finishing eviction
op_data->push_back(h);
// Entries for eviction become purgeable
@@ -2642,6 +2687,7 @@ void AutoHyperClockTable::PurgeImplLocked(OpData* op_data,
}
} else {
(void)op_data;
+ (void)data;
purgeable = ((h->meta.Load() >> ClockHandle::kStateShift) &
ClockHandle::kStateShareableBit) == 0;
}
@@ -2719,7 +2765,8 @@ using PurgeOpData = const UniqueId64x2;
using ClockUpdateChainOpData = ClockUpdateChainLockedOpData;
template <class OpData>
-void AutoHyperClockTable::PurgeImpl(OpData* op_data, size_t home) {
+void AutoHyperClockTable::PurgeImpl(OpData* op_data, size_t home,
+ BaseClockTable::EvictionData* data) {
// Early efforts to make AutoHCC fully wait-free ran into too many problems
// that needed obscure and potentially inefficient work-arounds to have a
// chance at working.
@@ -2800,9 +2847,9 @@ void AutoHyperClockTable::PurgeImpl(OpData* op_data, size_t home) {
if (!rewrite_lock.IsEnd()) {
if constexpr (kIsPurge) {
PurgeLockedOpData* locked_op_data{};
- PurgeImplLocked(locked_op_data, rewrite_lock, home);
+ PurgeImplLocked(locked_op_data, rewrite_lock, home, data);
} else {
- PurgeImplLocked(op_data, rewrite_lock, home);
+ PurgeImplLocked(op_data, rewrite_lock, home, data);
}
}
}
@@ -3405,7 +3452,8 @@ void AutoHyperClockTable::EraseUnRefEntries() {
}
void AutoHyperClockTable::Evict(size_t requested_charge, InsertState& state,
- EvictionData* data) {
+ EvictionData* data,
+ uint32_t eviction_effort_cap) {
// precondition
assert(requested_charge > 0);
@@ -3463,12 +3511,12 @@ void AutoHyperClockTable::Evict(size_t requested_charge, InsertState& state,
if (home >= used_length) {
break;
}
- PurgeImpl(&to_finish_eviction, home);
+ PurgeImpl(&to_finish_eviction, home, data);
}
}
for (HandleImpl* h : to_finish_eviction) {
- TrackAndReleaseEvictedEntry(h, data);
+ TrackAndReleaseEvictedEntry(h);
// NOTE: setting likely_empty_slot here can cause us to reduce the
// portion of "at home" entries, probably because an evicted entry
// is more likely to come back than a random new entry and would be
@@ -3496,6 +3544,11 @@ void AutoHyperClockTable::Evict(size_t requested_charge, InsertState& state,
if (old_clock_pointer + step_size >= max_clock_pointer) {
return;
}
+
+ if (IsEvictionEffortExceeded(*data, eviction_effort_cap)) {
+ eviction_effort_exceeded_count_.FetchAddRelaxed(1);
+ return;
+ }
}
}
diff --git a/cache/clock_cache.h b/cache/clock_cache.h
index 3086e7e97..54d656021 100644
--- a/cache/clock_cache.h
+++ b/cache/clock_cache.h
@@ -11,6 +11,7 @@
#include <array>
#include <atomic>
+#include <climits>
#include <cstddef>
#include <cstdint>
#include <memory>
@@ -374,6 +375,14 @@ struct ClockHandle : public ClockHandleBasicData {
class BaseClockTable {
public:
+ struct BaseOpts {
+ explicit BaseOpts(int _eviction_effort_cap)
+ : eviction_effort_cap(_eviction_effort_cap) {}
+ explicit BaseOpts(const HyperClockCacheOptions& opts)
+ : BaseOpts(opts.eviction_effort_cap) {}
+ int eviction_effort_cap;
+ };
+
BaseClockTable(CacheMetadataChargePolicy metadata_charge_policy,
MemoryAllocator* allocator,
const Cache::EvictionCallback* eviction_callback,
@@ -386,13 +395,13 @@ class BaseClockTable {
template <class Table>
typename Table::HandleImpl* CreateStandalone(ClockHandleBasicData& proto,
size_t capacity,
- bool strict_capacity_limit,
+ uint32_t eec_and_scl,
bool allow_uncharged);
template <class Table>
Status Insert(const ClockHandleBasicData& proto,
typename Table::HandleImpl** handle, Cache::Priority priority,
- size_t capacity, bool strict_capacity_limit);
+ size_t capacity, uint32_t eec_and_scl);
void Ref(ClockHandle& handle);
@@ -406,12 +415,17 @@ class BaseClockTable {
uint64_t GetYieldCount() const { return yield_count_.LoadRelaxed(); }
+ uint64_t GetEvictionEffortExceededCount() const {
+ return eviction_effort_exceeded_count_.LoadRelaxed();
+ }
+
struct EvictionData {
size_t freed_charge = 0;
size_t freed_count = 0;
+ size_t seen_pinned_count = 0;
};
- void TrackAndReleaseEvictedEntry(ClockHandle* h, EvictionData* data);
+ void TrackAndReleaseEvictedEntry(ClockHandle* h);
#ifndef NDEBUG
// Acquire N references
@@ -436,6 +450,7 @@ class BaseClockTable {
template <class Table>
Status ChargeUsageMaybeEvictStrict(size_t total_charge, size_t capacity,
bool need_evict_for_occupancy,
+ uint32_t eviction_effort_cap,
typename Table::InsertState& state);
// Helper for updating `usage_` for new entry with given `total_charge`
@@ -449,6 +464,7 @@ class BaseClockTable {
template <class Table>
bool ChargeUsageMaybeEvictNonStrict(size_t total_charge, size_t capacity,
bool need_evict_for_occupancy,
+ uint32_t eviction_effort_cap,
typename Table::InsertState& state);
protected: // data
@@ -461,9 +477,15 @@ class BaseClockTable {
RelaxedAtomic<uint64_t> clock_pointer_{};
// Counter for number of times we yield to wait on another thread.
+ // It is normal for this to occur rarely in normal operation.
// (Relaxed: a simple stat counter.)
RelaxedAtomic<uint64_t> yield_count_{};
+ // Counter for number of times eviction effort cap is exceeded.
+ // It is normal for this to occur rarely in normal operation.
+ // (Relaxed: a simple stat counter.)
+ RelaxedAtomic<uint64_t> eviction_effort_exceeded_count_{};
+
// TODO: is this separation needed if we don't do background evictions?
ALIGN_AS(CACHE_LINE_SIZE)
// Number of elements in the table.
@@ -517,17 +539,19 @@ class FixedHyperClockTable : public BaseClockTable {
inline void SetStandalone() { standalone = true; }
}; // struct HandleImpl
- struct Opts {
- explicit Opts(size_t _estimated_value_size)
- : estimated_value_size(_estimated_value_size) {}
- explicit Opts(const HyperClockCacheOptions& opts) {
+ struct Opts : public BaseOpts {
+ explicit Opts(size_t _estimated_value_size, int _eviction_effort_cap)
+ : BaseOpts(_eviction_effort_cap),
+ estimated_value_size(_estimated_value_size) {}
+ explicit Opts(const HyperClockCacheOptions& opts)
+ : BaseOpts(opts.eviction_effort_cap) {
assert(opts.estimated_entry_charge > 0);
estimated_value_size = opts.estimated_entry_charge;
}
size_t estimated_value_size;
};
- FixedHyperClockTable(size_t capacity, bool strict_capacity_limit,
+ FixedHyperClockTable(size_t capacity,
CacheMetadataChargePolicy metadata_charge_policy,
MemoryAllocator* allocator,
const Cache::EvictionCallback* eviction_callback,
@@ -549,7 +573,8 @@ class FixedHyperClockTable : public BaseClockTable {
// Runs the clock eviction algorithm trying to reclaim at least
// requested_charge. Returns how much is evicted, which could be less
// if it appears impossible to evict the requested amount without blocking.
- void Evict(size_t requested_charge, InsertState& state, EvictionData* data);
+ void Evict(size_t requested_charge, InsertState& state, EvictionData* data,
+ uint32_t eviction_effort_cap);
HandleImpl* Lookup(const UniqueId64x2& hashed_key);
@@ -803,18 +828,20 @@ class AutoHyperClockTable : public BaseClockTable {
}
}; // struct HandleImpl
- struct Opts {
- explicit Opts(size_t _min_avg_value_size)
- : min_avg_value_size(_min_avg_value_size) {}
+ struct Opts : public BaseOpts {
+ explicit Opts(size_t _min_avg_value_size, int _eviction_effort_cap)
+ : BaseOpts(_eviction_effort_cap),
+ min_avg_value_size(_min_avg_value_size) {}
- explicit Opts(const HyperClockCacheOptions& opts) {
+ explicit Opts(const HyperClockCacheOptions& opts)
+ : BaseOpts(opts.eviction_effort_cap) {
assert(opts.estimated_entry_charge == 0);
min_avg_value_size = opts.min_avg_entry_charge;
}
size_t min_avg_value_size;
};
- AutoHyperClockTable(size_t capacity, bool strict_capacity_limit,
+ AutoHyperClockTable(size_t capacity,
CacheMetadataChargePolicy metadata_charge_policy,
MemoryAllocator* allocator,
const Cache::EvictionCallback* eviction_callback,
@@ -841,7 +868,8 @@ class AutoHyperClockTable : public BaseClockTable {
// Runs the clock eviction algorithm trying to reclaim at least
// requested_charge. Returns how much is evicted, which could be less
// if it appears impossible to evict the requested amount without blocking.
- void Evict(size_t requested_charge, InsertState& state, EvictionData* data);
+ void Evict(size_t requested_charge, InsertState& state, EvictionData* data,
+ uint32_t eviction_effort_cap);
HandleImpl* Lookup(const UniqueId64x2& hashed_key);
@@ -906,7 +934,8 @@ class AutoHyperClockTable : public BaseClockTable {
// with proper handling to ensure all existing data is seen even in the
// presence of concurrent insertions, etc. (See implementation.)
template <class OpData>
- void PurgeImpl(OpData* op_data, size_t home = SIZE_MAX);
+ void PurgeImpl(OpData* op_data, size_t home = SIZE_MAX,
+ EvictionData* data = nullptr);
// An RAII wrapper for locking a chain of entries for removals. See
// implementation.
@@ -916,7 +945,7 @@ class AutoHyperClockTable : public BaseClockTable {
// implementation.
template <class OpData>
void PurgeImplLocked(OpData* op_data, ChainRewriteLock& rewrite_lock,
- size_t home);
+ size_t home, EvictionData* data);
// Update length_info_ as much as possible without waiting, given a known
// usable (ready for inserts and lookups) grow_home. (Previous grow_homes
@@ -1078,9 +1107,10 @@ class ALIGN_AS(CACHE_LINE_SIZE) ClockCacheShard final : public CacheShardBase {
// (Relaxed: eventual consistency/update is OK)
RelaxedAtomic<size_t> capacity_;
- // Whether to reject insertion if cache reaches its full capacity.
+ // Encodes eviction_effort_cap (bottom 31 bits) and strict_capacity_limit
+ // (top bit). See HyperClockCacheOptions::eviction_effort_cap etc.
// (Relaxed: eventual consistency/update is OK)
- RelaxedAtomic<bool> strict_capacity_limit_;
+ RelaxedAtomic<uint32_t> eec_and_scl_;
}; // class ClockCacheShard
template <class Table>
diff --git a/cache/compressed_secondary_cache_test.cc b/cache/compressed_secondary_cache_test.cc
index d72680b84..79f40868a 100644
--- a/cache/compressed_secondary_cache_test.cc
+++ b/cache/compressed_secondary_cache_test.cc
@@ -992,6 +992,8 @@ class CompressedSecCacheTestWithTiered
/*_capacity=*/0,
/*_estimated_entry_charge=*/256 << 10,
/*_num_shard_bits=*/0);
+ // eviction_effort_cap setting simply to avoid churn in existing test
+ hcc_opts.eviction_effort_cap = 100;
TieredCacheOptions opts;
lru_opts.capacity = 0;
lru_opts.num_shard_bits = 0;
diff --git a/cache/lru_cache_test.cc b/cache/lru_cache_test.cc
index 27fd5cc85..91b1d02c1 100644
--- a/cache/lru_cache_test.cc
+++ b/cache/lru_cache_test.cc
@@ -389,12 +389,13 @@ class ClockCacheTest : public testing::Test {
}
}
- void NewShard(size_t capacity, bool strict_capacity_limit = true) {
+ void NewShard(size_t capacity, bool strict_capacity_limit = true,
+ int eviction_effort_cap = 30) {
DeleteShard();
shard_ =
reinterpret_cast<Shard*>(port::cacheline_aligned_alloc(sizeof(Shard)));
- TableOpts opts{1 /*value_size*/};
+ TableOpts opts{1 /*value_size*/, eviction_effort_cap};
new (shard_)
Shard(capacity, strict_capacity_limit, kDontChargeCacheMetadata,
/*allocator*/ nullptr, &eviction_callback_, &hash_seed_, opts);
@@ -445,12 +446,20 @@ class ClockCacheTest : public testing::Test {
return Slice(reinterpret_cast<const char*>(&hashed_key), 16U);
}
+ // A bad hash function for testing / stressing collision handling
static inline UniqueId64x2 TestHashedKey(char key) {
// For testing hash near-collision behavior, put the variance in
// hashed_key in bits that are unlikely to be used as hash bits.
return {(static_cast<uint64_t>(key) << 56) + 1234U, 5678U};
}
+ // A reasonable hash function, for testing "typical behavior" etc.
+ template <typename T>
+ static inline UniqueId64x2 CheapHash(T i) {
+ return {static_cast<uint64_t>(i) * uint64_t{0x85EBCA77C2B2AE63},
+ static_cast<uint64_t>(i) * uint64_t{0xC2B2AE3D27D4EB4F}};
+ }
+
Shard* shard_ = nullptr;
private:
@@ -683,6 +692,53 @@ TYPED_TEST(ClockCacheTest, ClockEvictionTest) {
}
}
+TYPED_TEST(ClockCacheTest, ClockEvictionEffortCapTest) {
+ using HandleImpl = typename ClockCacheTest<TypeParam>::Shard::HandleImpl;
+ for (bool strict_capacity_limit : {true, false}) {
+ SCOPED_TRACE("strict_capacity_limit = " +
+ std::to_string(strict_capacity_limit));
+ for (int eec : {-42, 0, 1, 10, 100, 1000}) {
+ SCOPED_TRACE("eviction_effort_cap = " + std::to_string(eec));
+ constexpr size_t kCapacity = 1000;
+ // Start with much larger capacity to ensure that we can go way over
+ // capacity without reaching table occupancy limit.
+ this->NewShard(3 * kCapacity, strict_capacity_limit, eec);
+ auto& shard = *this->shard_;
+ shard.SetCapacity(kCapacity);
+
+ // Nearly fill the cache with pinned entries, then add a bunch of
+ // non-pinned entries. eviction_effort_cap should affect how many
+ // evictable entries are present beyond the cache capacity, despite
+ // being evictable.
+ constexpr size_t kCount = kCapacity - 1;
+ std::unique_ptr<HandleImpl* []> ha { new HandleImpl* [kCount] {} };
+ for (size_t i = 0; i < 2 * kCount; ++i) {
+ UniqueId64x2 hkey = this->CheapHash(i);
+ ASSERT_OK(shard.Insert(
+ this->TestKey(hkey), hkey, nullptr /*value*/, &kNoopCacheItemHelper,
+ 1 /*charge*/, i < kCount ? &ha[i] : nullptr, Cache::Priority::LOW));
+ }
+
+ if (strict_capacity_limit) {
+ // If strict_capacity_limit is enabled, the cache will never exceed its
+ // capacity
+ EXPECT_EQ(shard.GetOccupancyCount(), kCapacity);
+ } else {
+ // Rough inverse relationship between cap and possible memory
+ // explosion, which shows up as increased table occupancy count.
+ int effective_eec = std::max(int{1}, eec) + 1;
+ EXPECT_NEAR(shard.GetOccupancyCount() * 1.0,
+ kCount * (1 + 1.4 / effective_eec),
+ kCount * (0.6 / effective_eec) + 1.0);
+ }
+
+ for (size_t i = 0; i < kCount; ++i) {
+ shard.Release(ha[i]);
+ }
+ }
+ }
+}
+
namespace {
struct DeleteCounter {
int deleted = 0;