summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-30Add comment for memory usage in BeginTransaction() and WriteBatch::Clear() ↵HEADmainChangyu Bi
(#13042) Summary: ... to note that memory may not be freed when reusing a transaction. This means reusing a large transaction can cause excessive memory usage and it may be better to destruct the transaction object in some cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13042 Test Plan: no code change. Reviewed By: jowlyzhang Differential Revision: D63570612 Pulled By: cbi42 fbshipit-source-id: f19ff556f76d54831fb94715e8808035d07e25fa
2024-09-27Fix some missing values in stress test (#13039)Yu Zhang
Summary: When `avoid_flush_during_shutdown` is false, DB will flush the memtables if there is some unpersisted data: https://github.com/facebook/rocksdb/blob/79790cf2a80fb5e5b6799ebd69d3fb2ebe71d612/db/db_impl/db_impl.cc#L505-L510 `has_unpersisted_data_` is a flag that is only turned on for when WAL is disabled, for example: https://github.com/facebook/rocksdb/blob/79790cf2a80fb5e5b6799ebd69d3fb2ebe71d612/db/db_impl/db_impl_write.cc#L525-L528 In other cases, it just has its default false value. So if disableWAL is false, and avoid_flush_during_shutdown is false, close won't flush memtables. Stress test is also not flush wal/sync wal. There could be missing data, while reopen in stress test doesn't tolerate missing data. To make the test simpler, this changes it to always flush/sync wal during reopen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13039 Reviewed By: hx235 Differential Revision: D63494695 Pulled By: jowlyzhang fbshipit-source-id: 8f0fd9ed50a482a3955abc0882257ecc2e95926d
2024-09-26Bug fix and test BuildDBOptions (#13038)Peter Dillinger
Summary: The following DBOptions were not being propagated through BuildDBOptions, which could at least lead to settings being lost through `GetOptionsFromString()`, possibly elsewhere as well: * background_close_inactive_wals * write_dbid_to_manifest * write_identity_file * prefix_seek_opt_in_only Pull Request resolved: https://github.com/facebook/rocksdb/pull/13038 Test Plan: This problem was not being caught by OptionsSettableTest.DBOptionsAllFieldsSettable when the option was omitted from both options_helper.cc and options_settable_test.cc. I have added to the test to catch future instances (and the updated test was how I found three of the four missing options). The same kind of bug seems to be caught by ColumnFamilyOptionsAllFieldsSettable, and AFAIK analogous code does not exist for BlockBasedTableOptions. Reviewed By: ltamasi Differential Revision: D63483779 Pulled By: pdillinger fbshipit-source-id: a5d5f6e434174bacb8e5d251b767e81e62b7225a
2024-09-25More accurate accounting of compressed cache memory (#13032)anand76
Summary: When an item is inserted into the compressed secondary cache, this PR calculates the charge using the malloc_usable_size of the allocated memory, as well as the unique pointer allocation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13032 Test Plan: New unit test Reviewed By: pdillinger Differential Revision: D63418493 Pulled By: anand1976 fbshipit-source-id: 1db2835af6867442bb8cf6d9bf412e120ddd3824
2024-09-25Remove unnecessary semi-colon (#13034)Jay Huh
Summary: As title Pull Request resolved: https://github.com/facebook/rocksdb/pull/13034 Reviewed By: ltamasi Differential Revision: D63413712 Pulled By: jaykorean fbshipit-source-id: 0070761b0d9de1f50fe0baf235643d36aeb9f7f5
2024-09-25Respect lowest_used_cache_tier option for compressed blocks (#13030)anand76
Summary: If the lowest_used_cache_tier DB option is set to kVolatileTier, skip insertion of compressed blocks into the secondary cache. Previously, these were always inserted into the secondary cache via the InsertSaved() method, leading to pollution of the secondary cache with blocks that would never be read. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13030 Test Plan: Add a new unit test Reviewed By: pdillinger Differential Revision: D63329841 Pulled By: anand1976 fbshipit-source-id: 14d2fce2ed309401d9ad4d2e7c356218b6673f7b
2024-09-25More info in CompactionServiceJobInfo and CompactionJobStats (#13029)Jay Huh
Summary: Add the following to the `CompactionServiceJobInfo` - compaction_reason - is_full_compaction - is_manual_compaction - bottommost_level Added `is_remote_compaction` to the `CompactionJobStats` and set initial values to avoid UB for uninitialized values. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13029 Test Plan: ``` ./compaction_service_test --gtest_filter="*CompactionInfo*" ``` Reviewed By: anand1976 Differential Revision: D63322878 Pulled By: jaykorean fbshipit-source-id: f02a66ca45e660b9d354a43837d8ec6beb7621fb
2024-09-20Update HISTORY.md, version.h, and the format compatibility check script for ↵Levi Tamasi
the 9.7 release (#13027) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/13027 Reviewed By: jowlyzhang Differential Revision: D63158344 Pulled By: ltamasi fbshipit-source-id: e650a0024155d52c7aa2afd0f242b8363071279a
2024-09-20Steps toward deprecating implicit prefix seek, related fixes (#13026)Peter Dillinger
Summary: With some new use cases onboarding to prefix extractors/seek/filters, one of the risks is existing iterator code, e.g. for maintenance tasks, being unintentionally subject to prefix seek semantics. This is a longstanding known design flaw with prefix seek, and `prefix_same_as_start` and `auto_prefix_mode` were steps in the direction of making that obsolete. However, we can't just immediately set `total_order_seek` to true by default, because that would impact so much code instantly. Here we add a new DB option, `prefix_seek_opt_in_only` that basically allows users to transition to the future behavior when they are ready. When set to true, all iterators will be treated as if `total_order_seek=true` and then the only ways to get prefix seek semantics are with `prefix_same_as_start` or `auto_prefix_mode`. Related fixes / changes: * Make sure that `prefix_same_as_start` and `auto_prefix_mode` are compatible with (or override) `total_order_seek` (depending on your interpretation). * Fix a bug in which a new iterator after dynamically changing the prefix extractor might mix different prefix semantics between memtable and SSTs. Both should use the latest extractor semantics, which means iterators ignoring memtable prefix filters with an old extractor. And that means passing the latest prefix extractor to new memtable iterators that might use prefix seek. (Without the fix, the test added for this fails in many ways.) Suggested follow-up: * Investigate a FIXME where a MergeIteratorBuilder is created in db_impl.cc. No unit test detects a change in value that should impact correctness. * Make memtable prefix bloom compatible with `auto_prefix_mode`, which might require involving the memtablereps because we don't know at iterator creation time (only seek time) whether an auto_prefix_mode seek will be a prefix seek. * Add `prefix_same_as_start` testing to db_stress Pull Request resolved: https://github.com/facebook/rocksdb/pull/13026 Test Plan: tests updated, added. Add combination of `total_order_seek=true` and `auto_prefix_mode=true` to stress test. Ran `make blackbox_crash_test` for a long while. Manually ran tests with `prefix_seek_opt_in_only=true` as default, looking for unexpected issues. I inspected most of the results and migrated many tests to be ready for such a change (but not all). Reviewed By: ltamasi Differential Revision: D63147378 Pulled By: pdillinger fbshipit-source-id: 1f4477b730683d43b4be7e933338583702d3c25e
2024-09-20Load latest options from OPTIONS file in Remote host (#13025)Jay Huh
Summary: We've been serializing and deserializing DBOptions and CFOptions (and other CF into) as part of `CompactionServiceInput`. These are all readily available in the OPTIONS file and the remote worker can read the OPTIONS file to obtain the same information. This helps reducing the size of payload significantly. In a very rare scenario if the OPTIONS file is purged due to options change by primary host at the same time while the remote host is loading the latest options, it may fail. In this case, we just retry once. This also solves the problem where we had to open the default CF with the CFOption from another CF if the remote compaction is for a non-default column family. (TODO comment in /db_impl_secondary.cc) Pull Request resolved: https://github.com/facebook/rocksdb/pull/13025 Test Plan: Unit Tests ``` ./compaction_service_test ``` ``` ./compaction_job_test ``` Also tested with Meta's internal Offload Infra Reviewed By: anand1976, cbi42 Differential Revision: D63100109 Pulled By: jaykorean fbshipit-source-id: b7162695e31e2c5a920daa7f432842163a5b156d
2024-09-20Make Cache a customizable class (#13024)anand76
Summary: This PR allows a Cache object to be created using the object registry. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13024 Reviewed By: pdillinger Differential Revision: D63043233 Pulled By: anand1976 fbshipit-source-id: 5bc3f7c29b35ad62638ff8205451303e2cecea9d
2024-09-19Compact one file at a time for FIFO temperature change compactions (#13018)Changyu Bi
Summary: Per customer request, we should not merge multiple SST files together during temperature change compaction, since this can cause FIFO TTL compactions to be delayed. This PR changes the compaction picking logic to pick one file at a time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13018 Test Plan: * updated some existing unit tests to test this new behavior. Reviewed By: jowlyzhang Differential Revision: D62883292 Pulled By: cbi42 fbshipit-source-id: 6a9fc8c296b5d9b17168ef6645f25153241c8b93
2024-09-19Change the semantics of blob_garbage_collection_force_threshold to provide ↵Levi Tamasi
better control over space amp (#13022) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/13022 Currently, `blob_garbage_collection_force_threshold` applies to the oldest batch of blob files, which is typically only a small subset of the blob files currently eligible for garbage collection. This can result in a form of head-of-line blocking: no GC-triggered compactions will be scheduled if the oldest batch does not currently exceed the threshold, even if a lot of higher-numbered blob files do. This can in turn lead to high space amplification that exceeds the soft bound implicit in the force threshold (e.g. 50% would suggest a space amp of <2 and 75% would imply a space amp of <4). The patch changes the semantics of this configuration threshold to apply to the entire set of blob files that are eligible for garbage collection based on `blob_garbage_collection_age_cutoff`. This provides more intuitive semantics for the option and can provide a better write amp/space amp trade-off. (Note that GC-triggered compactions still pick the same SST files as before, so triggered GC still targets the oldest the blob files.) Reviewed By: jowlyzhang Differential Revision: D62977860 fbshipit-source-id: a999f31fe9cdda313de513f0e7a6fc707424d4a3
2024-09-19Steps toward making IDENTITY file obsolete (#13019)Peter Dillinger
Summary: * Set write_dbid_to_manifest=true by default * Add new option write_identity_file (default true) that allows us to opt-in to future behavior without identity file * Refactor related DB open code to minimize code duplication _Recommend hiding whitespace changes for review_ Intended follow-up: add support to ldb for reading and even replacing the DB identity in the manifest. Could be a variant of `update_manifest` command or based on it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13019 Test Plan: unit tests and stress test updated for new functionality Reviewed By: anand1976 Differential Revision: D62898229 Pulled By: pdillinger fbshipit-source-id: c08b25cf790610b034e51a9de0dc78b921abbcf0
2024-09-18Add an option to dump wal seqno gaps (#13014)Yu Zhang
Summary: Add an option `--only_print_seqno_gaps` for wal dump to help with debugging. This option will check the continuity of sequence numbers in WAL logs, assuming `seq_per_batch` is false. `--walfile` option now also takes a directory, and it will check all WAL logs in the directory in chronological order. When a gap is found, we can further check if it's related to operations like external file ingestion. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13014 Test Plan: Manually tested Reviewed By: ltamasi Differential Revision: D62989115 Pulled By: jowlyzhang fbshipit-source-id: 22e3326344e7969ff9d5091d21fec2935770fbc7
2024-09-18Fix and generalize framework for filtering range queries, etc. (#13005)Peter Dillinger
Summary: There was a subtle design/contract bug in the previous version of range filtering in experimental.h If someone implemented a key segments extractor with "all or nothing" fixed size segments, that could result in unsafe range filtering. For example, with two segments of width 3: ``` x = 0x|12 34 56|78 9A 00| y = 0x|12 34 56||78 9B z = 0x|12 34 56|78 9C 00| ``` Segment 1 of y (empty) is out of order with segment 1 of x and z. I have re-worked the contract to make it clear what does work, and implemented a standard extractor for fixed-size segments, CappedKeySegmentsExtractor. The safe approach for filtering is to consume as much as is available for a segment in the case of a short key. I have also added support for min-max filtering with reverse byte-wise comparator, which is probably the 2nd most common comparator for RocksDB users (because of MySQL). It might seem that a min-max filter doesn't care about forward or reverse ordering, but it does when trying to determine whether in input range from segment values v1 to v2, where it so happens that v2 is byte-wise less than v1, is an empty forward interval or a non-empty reverse interval. At least in the current setup, we don't have that context. A new unit test (with some refactoring) tests CappedKeySegmentsExtractor, reverse byte-wise comparator, and the corresponding min-max filter. I have also (contractually / mathematically) generalized the framework to comparators other than the byte-wise comparator, and made other generalizations to make the extractor limitations more explicitly connected to the particular filters and filtering used--at least in description. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13005 Test Plan: added unit tests as described Reviewed By: jowlyzhang Differential Revision: D62769784 Pulled By: pdillinger fbshipit-source-id: 0d41f0d0273586bdad55e4aa30381ebc861f7044
2024-09-18Fix orphaned files in SstFileManager (#13015)Nick Brekhus
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/13015 `Close()`ing a database now releases tracked files in `SstFileManager`. Previously this space would be leaked until the database was later reopened. Reviewed By: jowlyzhang Differential Revision: D62590773 fbshipit-source-id: 5461bd253d974ac4967ad52fee92e2650f8a9a28
2024-09-17Update folly Github hash (#13017)Yu Zhang
Summary: The internal codebase is updated for the coro directory's graduation from experimental. Updating our build script for a newer version with this change too. Using this hash: https://github.com/facebook/folly/commit/03041f014b6e6ebb6119ffae8b7a37308f52e913 Pull Request resolved: https://github.com/facebook/rocksdb/pull/13017 Reviewed By: nickbrekhus Differential Revision: D62763932 Pulled By: jowlyzhang fbshipit-source-id: 1b211707fbc7d974d6d6ceaf577e174424bb44ed
2024-09-17Fix a bug with auto recovery on WAL write error (#12995)Changyu Bi
Summary: A recent crash test failure shows that auto recovery from WAL write failure can cause CFs to be inconsistent. A unit test repro in P1569398553. The following is an example sequence of events: ``` 0. manual_wal_flush is true. There are multiple CFs in a DB. 1. Submit a write batch with updates to multiple CF 2. A FlushWAL or a memtable swtich that will try to write the buffered WAL data. Fail this write so that buffered WAL data is dropped: https://github.com/facebook/rocksdb/blob/4b1d595306fae602b56d2aa5128b11b1162bfa81/file/writable_file_writer.cc#L624 The error needs to be retryable to start background auto recovery. 3. One CF successfully flushes its memtable during auto recovery. 4. Crash the process. 5. Reopen the DB, one CF will have the update as a result of successful flush. Other CFs will miss all the updates in the write batch since WAL does not have them. ``` This can happen if a users configures manual_wal_flush, uses more than one CF, and can hit retryable error for WAL writes. This PR is a short-term fix that upgrades WAL related errors to fatal and not trigger auto recovery. A long-term fix may be not drop buffered WAL data by checking how much data is actually written, or require atomically flushing all column families during error recovery from this kind of errors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12995 Test Plan: added unit test to check error severity and if recovery is triggered. A crash test repro command that fails in a few runs before this PR: ``` python3 ./tools/db_crashtest.py blackbox --interval=60 --metadata_write_fault_one_in=1000 --column_families=10 --exclude_wal_from_write_fault_injection=0 --manual_wal_flush_one_in=1000 --WAL_size_limit_MB=10240 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --adm_policy=1 --advise_random_on_open=1 --allow_data_in_errors=True --allow_fallocate=1 --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=1 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=100 --block_align=1 --block_protection_bytes_per_key=0 --block_size=16384 --bloom_before_level=2147483647 --bottommost_compression_type=none --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_index_and_filter_blocks_with_high_priority=1 --cache_size=33554432 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=1 --compaction_readahead_size=1048576 --compaction_ttl=0 --compress_format_version=1 --compressed_secondary_cache_size=8388608 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=4 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=0 --db_write_buffer_size=0 --decouple_partitioned_filters=1 --default_temperature=kCold --default_write_temperature=kWarm --delete_obsolete_files_period_micros=30000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_file_deletions_one_in=1000000 --disable_manual_compaction_one_in=1000000 --disable_wal=0 --dump_malloc_stats=1 --enable_checksum_handoff=1 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=0 --enable_index_compression=0 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=1 --enable_sst_partitioner_factory=0 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=1 --error_recovery_with_no_fault_injection=1 --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --fill_cache=1 --flush_one_in=1000000 --format_version=6 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=10000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --index_block_restart_interval=4 --index_shortening=1 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --last_level_temperature=kWarm --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=10000 --log_file_time_to_roll=0 --log_readahead_size=0 --long_running_snapshots=0 --lowest_used_cache_tier=2 --manifest_preallocation_size=5120 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=16 --max_total_wal_size=0 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=2097152 --memtable_insert_hint_per_batch=1 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.001 --memtable_protection_bytes_per_key=2 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=1 --metadata_read_fault_one_in=0 --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=2 --open_files=100 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --optimize_filters_for_hits=0 --optimize_filters_for_memory=0 --optimize_multiget_for_io=0 --paranoid_file_checks=1 --paranoid_memory_checks=0 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=10000 --periodic_compaction_seconds=0 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=524288 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=10000 --sample_for_compression=5 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --set_options_one_in=10000 --skip_stats_update_on_db_open=1 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=1048576 --sqfc_name=bar --sqfc_version=1 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=600 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=1 --subcompactions=2 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=3 --uncache_aggressiveness=8 --universal_max_read_amp=-1 --unpartitioned_pinning=2 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=0 --use_attribute_group=1 --use_delta_encoding=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=1 --use_multi_cf_iterator=1 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=1 --use_sqfc_for_range_queries=0 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_compression=1 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=4194304 --write_dbid_to_manifest=0 --write_fault_one_in=50 --writepercent=35 --ops_per_thread=100000 --preserve_unverified_changes=1 ``` Reviewed By: hx235 Differential Revision: D62888510 Pulled By: cbi42 fbshipit-source-id: 308bdbbb8d897cc8eba950155cd0e37cf7eb76fe
2024-09-17Remove last user of AutoHeaders.RECURSIVE_GLOBJon Janzen
Summary: I came across this code while buckifying parts of folly and fizz in open source. This is pretty hacky code and cleaning it up doesn't seem that hard, so I did it. Reviewed By: zertosh, pdillinger Differential Revision: D62781766 fbshipit-source-id: 43714bce992c53149d1e619063d803297362fb5d
2024-09-17Add missing RemapFileSystem::ReopenWritableFile (#12941)leipeng
Summary: `RemapFileSystem::ReopenWritableFile` is missing, add it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12941 Reviewed By: pdillinger Differential Revision: D61822540 Pulled By: cbi42 fbshipit-source-id: dc228f7e8b842216f63de8b925cb663898455345
2024-09-14Deshim coro in fbcode/internal_repo_rocksdbNicholas Ormrod
Summary: The following rules were deshimmed: ``` //folly/experimental/coro:accumulate -> //folly/coro:accumulate //folly/experimental/coro:async_generator -> //folly/coro:async_generator //folly/experimental/coro:async_pipe -> //folly/coro:async_pipe //folly/experimental/coro:async_scope -> //folly/coro:async_scope //folly/experimental/coro:async_stack -> //folly/coro:async_stack //folly/experimental/coro:baton -> //folly/coro:baton //folly/experimental/coro:blocking_wait -> //folly/coro:blocking_wait //folly/experimental/coro:collect -> //folly/coro:collect //folly/experimental/coro:concat -> //folly/coro:concat //folly/experimental/coro:coroutine -> //folly/coro:coroutine //folly/experimental/coro:current_executor -> //folly/coro:current_executor //folly/experimental/coro:detach_on_cancel -> //folly/coro:detach_on_cancel //folly/experimental/coro:detail_barrier -> //folly/coro:detail_barrier //folly/experimental/coro:detail_barrier_task -> //folly/coro:detail_barrier_task //folly/experimental/coro:detail_current_async_frame -> //folly/coro:detail_current_async_frame //folly/experimental/coro:detail_helpers -> //folly/coro:detail_helpers //folly/experimental/coro:detail_malloc -> //folly/coro:detail_malloc //folly/experimental/coro:detail_manual_lifetime -> //folly/coro:detail_manual_lifetime //folly/experimental/coro:detail_traits -> //folly/coro:detail_traits //folly/experimental/coro:filter -> //folly/coro:filter //folly/experimental/coro:future_util -> //folly/coro:future_util //folly/experimental/coro:generator -> //folly/coro:generator //folly/experimental/coro:gmock_helpers -> //folly/coro:gmock_helpers //folly/experimental/coro:gtest_helpers -> //folly/coro:gtest_helpers //folly/experimental/coro:inline_task -> //folly/coro:inline_task //folly/experimental/coro:invoke -> //folly/coro:invoke //folly/experimental/coro:merge -> //folly/coro:merge //folly/experimental/coro:mutex -> //folly/coro:mutex //folly/experimental/coro:promise -> //folly/coro:promise //folly/experimental/coro:result -> //folly/coro:result //folly/experimental/coro:retry -> //folly/coro:retry //folly/experimental/coro:rust_adaptors -> //folly/coro:rust_adaptors //folly/experimental/coro:scope_exit -> //folly/coro:scope_exit //folly/experimental/coro:shared_lock -> //folly/coro:shared_lock //folly/experimental/coro:shared_mutex -> //folly/coro:shared_mutex //folly/experimental/coro:sleep -> //folly/coro:sleep //folly/experimental/coro:small_unbounded_queue -> //folly/coro:small_unbounded_queue //folly/experimental/coro:task -> //folly/coro:task //folly/experimental/coro:timed_wait -> //folly/coro:timed_wait //folly/experimental/coro:timeout -> //folly/coro:timeout //folly/experimental/coro:traits -> //folly/coro:traits //folly/experimental/coro:transform -> //folly/coro:transform //folly/experimental/coro:unbounded_queue -> //folly/coro:unbounded_queue //folly/experimental/coro:via_if_async -> //folly/coro:via_if_async //folly/experimental/coro:with_async_stack -> //folly/coro:with_async_stack //folly/experimental/coro:with_cancellation -> //folly/coro:with_cancellation //folly/experimental/coro:bounded_queue -> //folly/coro:bounded_queue //folly/experimental/coro:shared_promise -> //folly/coro:shared_promise //folly/experimental/coro:cleanup -> //folly/coro:cleanup //folly/experimental/coro:auto_cleanup_fwd -> //folly/coro:auto_cleanup_fwd //folly/experimental/coro:auto_cleanup -> //folly/coro:auto_cleanup ``` The following headers were deshimmed: ``` folly/experimental/coro/Accumulate.h -> folly/coro/Accumulate.h folly/experimental/coro/Accumulate-inl.h -> folly/coro/Accumulate-inl.h folly/experimental/coro/AsyncGenerator.h -> folly/coro/AsyncGenerator.h folly/experimental/coro/AsyncPipe.h -> folly/coro/AsyncPipe.h folly/experimental/coro/AsyncScope.h -> folly/coro/AsyncScope.h folly/experimental/coro/AsyncStack.h -> folly/coro/AsyncStack.h folly/experimental/coro/Baton.h -> folly/coro/Baton.h folly/experimental/coro/BlockingWait.h -> folly/coro/BlockingWait.h folly/experimental/coro/Collect.h -> folly/coro/Collect.h folly/experimental/coro/Collect-inl.h -> folly/coro/Collect-inl.h folly/experimental/coro/Concat.h -> folly/coro/Concat.h folly/experimental/coro/Concat-inl.h -> folly/coro/Concat-inl.h folly/experimental/coro/Coroutine.h -> folly/coro/Coroutine.h folly/experimental/coro/CurrentExecutor.h -> folly/coro/CurrentExecutor.h folly/experimental/coro/DetachOnCancel.h -> folly/coro/DetachOnCancel.h folly/experimental/coro/detail/Barrier.h -> folly/coro/detail/Barrier.h folly/experimental/coro/detail/BarrierTask.h -> folly/coro/detail/BarrierTask.h folly/experimental/coro/detail/CurrentAsyncFrame.h -> folly/coro/detail/CurrentAsyncFrame.h folly/experimental/coro/detail/Helpers.h -> folly/coro/detail/Helpers.h folly/experimental/coro/detail/Malloc.h -> folly/coro/detail/Malloc.h folly/experimental/coro/detail/ManualLifetime.h -> folly/coro/detail/ManualLifetime.h folly/experimental/coro/detail/Traits.h -> folly/coro/detail/Traits.h folly/experimental/coro/Filter.h -> folly/coro/Filter.h folly/experimental/coro/Filter-inl.h -> folly/coro/Filter-inl.h folly/experimental/coro/FutureUtil.h -> folly/coro/FutureUtil.h folly/experimental/coro/Generator.h -> folly/coro/Generator.h folly/experimental/coro/GmockHelpers.h -> folly/coro/GmockHelpers.h folly/experimental/coro/GtestHelpers.h -> folly/coro/GtestHelpers.h folly/experimental/coro/detail/InlineTask.h -> folly/coro/detail/InlineTask.h folly/experimental/coro/Invoke.h -> folly/coro/Invoke.h folly/experimental/coro/Merge.h -> folly/coro/Merge.h folly/experimental/coro/Merge-inl.h -> folly/coro/Merge-inl.h folly/experimental/coro/Mutex.h -> folly/coro/Mutex.h folly/experimental/coro/Promise.h -> folly/coro/Promise.h folly/experimental/coro/Result.h -> folly/coro/Result.h folly/experimental/coro/Retry.h -> folly/coro/Retry.h folly/experimental/coro/RustAdaptors.h -> folly/coro/RustAdaptors.h folly/experimental/coro/ScopeExit.h -> folly/coro/ScopeExit.h folly/experimental/coro/SharedLock.h -> folly/coro/SharedLock.h folly/experimental/coro/SharedMutex.h -> folly/coro/SharedMutex.h folly/experimental/coro/Sleep.h -> folly/coro/Sleep.h folly/experimental/coro/Sleep-inl.h -> folly/coro/Sleep-inl.h folly/experimental/coro/SmallUnboundedQueue.h -> folly/coro/SmallUnboundedQueue.h folly/experimental/coro/Task.h -> folly/coro/Task.h folly/experimental/coro/TimedWait.h -> folly/coro/TimedWait.h folly/experimental/coro/Timeout.h -> folly/coro/Timeout.h folly/experimental/coro/Timeout-inl.h -> folly/coro/Timeout-inl.h folly/experimental/coro/Traits.h -> folly/coro/Traits.h folly/experimental/coro/Transform.h -> folly/coro/Transform.h folly/experimental/coro/Transform-inl.h -> folly/coro/Transform-inl.h folly/experimental/coro/UnboundedQueue.h -> folly/coro/UnboundedQueue.h folly/experimental/coro/ViaIfAsync.h -> folly/coro/ViaIfAsync.h folly/experimental/coro/WithAsyncStack.h -> folly/coro/WithAsyncStack.h folly/experimental/coro/WithCancellation.h -> folly/coro/WithCancellation.h folly/experimental/coro/BoundedQueue.h -> folly/coro/BoundedQueue.h folly/experimental/coro/SharedPromise.h -> folly/coro/SharedPromise.h folly/experimental/coro/Cleanup.h -> folly/coro/Cleanup.h folly/experimental/coro/AutoCleanup-fwd.h -> folly/coro/AutoCleanup-fwd.h folly/experimental/coro/AutoCleanup.h -> folly/coro/AutoCleanup.h ``` This is a codemod. It was automatically generated and will be landed once it is approved and tests are passing in sandcastle. You have been added as a reviewer by Sentinel or Butterfly. Autodiff project: dcoro Autodiff partition: fbcode.internal_repo_rocksdb Autodiff bookmark: ad.dcoro.fbcode.internal_repo_rocksdb Reviewed By: dtolnay Differential Revision: D62684411 fbshipit-source-id: 8dbd31ab64fcdd99435d322035b9668e3200e0a3
2024-09-13Fix wraparound in SstFileManager (#13010)Nick Brekhus
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/13010 The OnAddFile cur_compactions_reserved_size_ accounting causes wraparound when re-opening a database with an unowned SstFileManager and during recovery. It was introduced in #4164 which addresses out of space recovery with an unclear purpose. Compaction jobs do this accounting via EnoughRoomForCompaction/OnCompactionCompletion and to my understanding would never reuse a sst file name. Reviewed By: anand1976 Differential Revision: D62535775 fbshipit-source-id: a7c44d6e0a4b5ff74bc47abfe57c32ca6770243d
2024-09-13Fix a couple of missing cases of retry on corruption (#13007)anand76
Summary: For SST checksum mismatch corruptions in the read path, RocksDB retries the read if the underlying file system supports verification and reconstruction of data (`FSSupportedOps::kVerifyAndReconstructRead`). There were a couple of places where the retry was missing - reading the SST footer and the properties block. This PR fixes the retry in those cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13007 Test Plan: Add new unit tests Reviewed By: jaykorean Differential Revision: D62519186 Pulled By: anand1976 fbshipit-source-id: 50aa38f18f2a53531a9fc8d4ccdf34fbf034ed59
2024-09-12Fix a bug in ReFitLevel() where `FileMetaData::being_compacted` is not ↵Changyu Bi
cleared (#13009) Summary: in ReFitLevel(), we were not setting being_compacted to false after ReFitLevel() is done. This is not a issue if refit level is successful, since new FileMetaData is created for files at the target level. However, if there's an error during RefitLevel(), e.g., Manifest write failure, we should clear the being_compacted field for these files. Otherwise, these files will not be picked for compaction until db reopen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13009 Test Plan: existing test. - stress test failure in T200339331 should not happen anymore. Reviewed By: hx235 Differential Revision: D62597169 Pulled By: cbi42 fbshipit-source-id: 0ba659806da6d6d4b42384fc95268b2d7bad720e
2024-09-10Add an internal API MemTableList::GetEditForDroppingCurrentVersion (#13001)Yu Zhang
Summary: Prepare this internal API to be used by atomic data replacement. The main purpose of this API is to get a `VersionEdit` to mark the entire current `MemTableListVersion` as dropped. Flush needs the similar functionality when installing results, so that logic is refactored into a util function `GetDBRecoveryEditForObsoletingMemTables` to be shared by flush and this internal API. To test this internal API, flush's result installation is redirected to use this API when it is flushing all the immutable MemTables in debug mode. It should achieve the exact same results, just with a duplicated `VersionEdit::log_number` field that doesn't upsets the recovery logic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13001 Test Plan: Existing tests Reviewed By: pdillinger Differential Revision: D62309591 Pulled By: jowlyzhang fbshipit-source-id: e25914d9a2e281c25ab7ee31a66eaf6adfae4b88
2024-09-10Support building db_bench using buck (#13004)Levi Tamasi
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/13004 The patch extends the buckifier script so it generates a target for `db_bench` as well. Reviewed By: cbi42 Differential Revision: D62407071 fbshipit-source-id: 0cb98a324ce0598ad84a8675aa77b7d0f91bf40c
2024-09-06Provide a way to invoke a callback for a Cache handle (#12987)anand76
Summary: Add the `ApplyToHandle` method to the `Cache` interface to allow a caller to request the invocation of a callback on the given cache handle. The goal here is to allow a cache that manages multiple cache instances to use a callback on a handle to determine which instance it belongs to. For example, the callback can hash the key and use that to pick the correct target instance. This is useful to redirect methods like `Ref` and `Release`, which don't know the cache key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12987 Reviewed By: pdillinger Differential Revision: D62151907 Pulled By: anand1976 fbshipit-source-id: e4ffbbb96eac9061d2ab0e7e1739eea5ebb1cd58
2024-09-06Make compaction always use the input version with extra ref protection (#12992)Yu Zhang
Summary: `Compaction` is already creating its own ref for the input Version: https://github.com/facebook/rocksdb/blob/4b1d595306fae602b56d2aa5128b11b1162bfa81/db/compaction/compaction.cc#L73 And properly Unref it during destruction: https://github.com/facebook/rocksdb/blob/4b1d595306fae602b56d2aa5128b11b1162bfa81/db/compaction/compaction.cc#L450 This PR redirects compaction's access of `cfd->current()` to this input `Version`, to prepare for when a column family's data can be replaced all together, and `cfd->current()` is not safe to access for a compaction job. Because a new `Version` with just some other external files could be installed as `cfd->current()`. The compaction job's expectation of the current `Version` and the corresponding storage info to always have its input files will no longer be guaranteed. My next follow up is to do a similar thing for flush, also to prepare it for when a column family's data can be replaced. I will make it create its own reference of the current `MemTableListVersion` and use it as input, all flush job's access of memtables will be wired to that input `MemTableListVersion`. Similarly this reference will be unreffed during a flush job's destruction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12992 Test Plan: Existing tests Reviewed By: pdillinger Differential Revision: D62212625 Pulled By: jowlyzhang fbshipit-source-id: 9a781213469cf366857a128d50a702af683a046a
2024-09-06Add documentation for background job's state transition (#12994)Yu Zhang
Summary: The `SchedulePending*` API is a bit confusing since it doesn't immediately schedule the work and can be confused with the actual scheduling. So I have changed these to be `EnqueuePending*` and added some documentation for the corresponding state transitions of these background work. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12994 Test Plan: existing tests Reviewed By: cbi42 Differential Revision: D62252746 Pulled By: jowlyzhang fbshipit-source-id: ee68be6ed33070cad9a5004b7b3e16f5bcb041bf
2024-09-06Disable WAL fault injection in some case (#13000)Changyu Bi
Summary: when manual_wal_flush is true and when there are more than 1 CF, WAL fault injection can cause CFs to be inconsistent. See more explanation and repro in T199157789. Disable the combination for now until we have a fix that allows auto recovery. This also helps to see if there's other cause of stress test failures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13000 Test Plan: the following command could repro db consistency failure in a few runs before this PR. From stress test output we can also see that exclude_wal_from_write_fault_injection and metadata_write_fault_one_in are sanitized to 0. ``` python3 ./tools/db_crashtest.py blackbox --interval=60 --metadata_write_fault_one_in=1000 --column_families=10 --exclude_wal_from_write_fault_injection=0 --manual_wal_flush_one_in=1000 --WAL_size_limit_MB=10240 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --adm_policy=1 --advise_random_on_open=1 --allow_data_in_errors=True --allow_fallocate=1 --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=1 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=100 --block_align=1 --block_protection_bytes_per_key=0 --block_size=16384 --bloom_before_level=2147483647 --bottommost_compression_type=none --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_index_and_filter_blocks_with_high_priority=1 --cache_size=33554432 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=1 --compaction_readahead_size=1048576 --compaction_ttl=0 --compress_format_version=1 --compressed_secondary_cache_size=8388608 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=4 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=0 --db_write_buffer_size=0 --decouple_partitioned_filters=1 --default_temperature=kCold --default_write_temperature=kWarm --delete_obsolete_files_period_micros=30000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_file_deletions_one_in=1000000 --disable_manual_compaction_one_in=1000000 --disable_wal=0 --dump_malloc_stats=1 --enable_checksum_handoff=1 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=0 --enable_index_compression=0 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=1 --enable_sst_partitioner_factory=0 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=1 --error_recovery_with_no_fault_injection=1 --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --fill_cache=1 --flush_one_in=1000000 --format_version=6 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=10000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --index_block_restart_interval=4 --index_shortening=1 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --last_level_temperature=kWarm --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=10000 --log_file_time_to_roll=0 --log_readahead_size=0 --long_running_snapshots=0 --lowest_used_cache_tier=2 --manifest_preallocation_size=5120 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=16 --max_total_wal_size=0 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=2097152 --memtable_insert_hint_per_batch=1 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.001 --memtable_protection_bytes_per_key=2 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=1 --metadata_read_fault_one_in=0 --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=2 --open_files=100 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --optimize_filters_for_hits=0 --optimize_filters_for_memory=0 --optimize_multiget_for_io=0 --paranoid_file_checks=1 --paranoid_memory_checks=0 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=10000 --periodic_compaction_seconds=0 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=524288 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=10000 --sample_for_compression=5 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --set_options_one_in=10000 --skip_stats_update_on_db_open=1 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=1048576 --sqfc_name=bar --sqfc_version=1 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=600 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=1 --subcompactions=2 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=3 --uncache_aggressiveness=8 --universal_max_read_amp=-1 --unpartitioned_pinning=2 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=0 --use_attribute_group=1 --use_delta_encoding=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=1 --use_multi_cf_iterator=1 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=1 --use_sqfc_for_range_queries=0 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_compression=1 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=4194304 --write_dbid_to_manifest=0 --write_fault_one_in=50 --writepercent=35 --ops_per_thread=100000 --preserve_unverified_changes=1 ``` Reviewed By: hx235 Differential Revision: D62303631 Pulled By: cbi42 fbshipit-source-id: d9441188ee84d53e5e7916f3305e50843fe9fde2
2024-09-06More valgrind fixes (#12990)Peter Dillinger
Summary: * https://github.com/facebook/rocksdb/issues/12936 was insufficient to fix the std::optional false positives. Making a fix validated in CI this time (see https://github.com/facebook/rocksdb/issues/12991) * valgrind grinds to a halt on startup on my dev machine apparently because it expects internet access. Disable its attempts to access the internet when git is using a proxy. * Move PORTABLE=1 from CI job to the Makefile. Without it, valgrind complains about illegal instructions (too new) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12990 Test Plan: manual, watch nightly valgrind job Reviewed By: ltamasi Differential Revision: D62203242 Pulled By: pdillinger fbshipit-source-id: a611b08da7dbd173b0709ed7feb0578729553a17
2024-09-05Ensure SSTs compressed in tiered_secondary_cache_test (#12993)Peter Dillinger
Summary: It appears the arm testsuite is failing because it is building without snappy, which is causing the SST files not to be compressed, which somehow causes these tests to fail. Manually setting LZ4 which is already required. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12993 Test Plan: reproduced and verified fix on ARM laptop Reviewed By: anand1976 Differential Revision: D62216451 Pulled By: pdillinger fbshipit-source-id: 3f21fcd9be0edaa66c7eca0cb7d56b998171e263
2024-09-04Fix and clarify ignore_unknown_options (#12989)Peter Dillinger
Summary: `ignore_unknown_options=true` had an undocumented behavior of having no effect (disallow unknown options) if reading from the same or older major.minor version. Presumably this was intended to catch unintentional addition of new options in a patch release, but there is no automated version compatibility testing between patch releases. So this was a bad choice without such testing support, because it just means users would hit the failure in case of adding features to a patch release. In this diff we respect ignore_unknown_options when reading a file from any newer version, even patch versions, and document this behavior in the API. I don't think it's practical or necessary to test among patch releases in check_format_compatible.sh. This seems like an exceptional case of applying a *different semantics* to patch version updates than to minor/major versions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12989 Test Plan: unit test updated (and refactored) Reviewed By: jaykorean Differential Revision: D62168738 Pulled By: pdillinger fbshipit-source-id: fb3c3ef30f0bbad0d5ffcc4570fb9ef963e7daac
2024-09-03Fix the check format compatible change (#12988)Jay Huh
Summary: `check_format_compatible` script was broken due to extra comma added in 5b8f5cbcf4324584b914dcbc27bed7c007c82b3e e.g. https://github.com/facebook/rocksdb/actions/runs/10505042711/job/29101787220 ``` ... 2024-08-23T11:44:15.0175202Z == Building 9.5.fb, debug 2024-08-23T11:44:15.0190592Z fatal: ambiguous argument '_tmp_origin/9.5.fb,': unknown revision or path not in the working tree. ... ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12988 Test Plan: ``` tools/check_format_compatible.sh ``` ``` == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 8.6.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/8.6.fb to /tmp/rocksdb_format_compatible_jewoongh/db/8.6.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 8.7.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/8.7.fb to /tmp/rocksdb_format_compatible_jewoongh/db/8.7.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 8.8.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/8.8.fb to /tmp/rocksdb_format_compatible_jewoongh/db/8.8.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 8.9.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/8.9.fb to /tmp/rocksdb_format_compatible_jewoongh/db/8.9.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 8.10.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/8.10.fb to /tmp/rocksdb_format_compatible_jewoongh/db/8.10.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 8.11.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/8.11.fb to /tmp/rocksdb_format_compatible_jewoongh/db/8.11.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 9.0.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/9.0.fb to /tmp/rocksdb_format_compatible_jewoongh/db/9.0.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 9.1.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/9.1.fb to /tmp/rocksdb_format_compatible_jewoongh/db/9.1.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 9.2.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/9.2.fb to /tmp/rocksdb_format_compatible_jewoongh/db/9.2.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 9.3.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/9.3.fb to /tmp/rocksdb_format_compatible_jewoongh/db/9.3.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 9.4.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/9.4.fb to /tmp/rocksdb_format_compatible_jewoongh/db/9.4.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 9.5.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/9.5.fb to /tmp/rocksdb_format_compatible_jewoongh/db/9.5.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt == Use HEAD (48339d2a65670211bc9c204364a2127ba9b2a460) to open DB generated using 9.6.fb... == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/9.6.fb to /tmp/rocksdb_format_compatible_jewoongh/db/9.6.fb/db_dump.txt == Dumping data from /tmp/rocksdb_format_compatible_jewoongh/db/current to /tmp/rocksdb_format_compatible_jewoongh/db/current/db_dump.txt ==== Compatibility Test PASSED ==== ``` Reviewed By: pdillinger Differential Revision: D62162454 Pulled By: jaykorean fbshipit-source-id: 562225c6cb27e0eb66f241a6f9424dc624d8c837
2024-09-03Fix non-ASCII character (#12972)Symious
Summary: Met the following error while compiling the project. ``` build_tools/check-sources.sh utilities/fault_injection_fs.cc:509: // If there<E2><80><99>s no injected error, then cb will be called asynchronously when utilities/fault_injection_fs.cc:510: // target_ actually finishes the read. But if there<E2><80><99>s an injected error, it utilities/fault_injection_fs.cc:512: // isn<E2><80><99>t invoked at all. ^^^^ Use only ASCII characters in source files make[1]: *** [Makefile:1291: check-sources] Error 1 make[1]: Leaving directory '/home/janus/Github/symious/rocksdb' make: *** [Makefile:1084: check] Error 2 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12972 Reviewed By: hx235 Differential Revision: D61923865 Pulled By: cbi42 fbshipit-source-id: 63af0a38fea15e09a860895bdd5ed0a57700e447
2024-09-03Add a new file ingestion option `link_files` (#12980)Changyu Bi
Summary: Add option `IngestExternalFileOptions::link_files` that hard links input files and preserves original file links after ingestion, unlike `move_files` which will unlink input files after ingestion. This can be useful when being used together with `allow_db_generated_files` to ingest files from another DB. Also reverted the change to `move_files` in https://github.com/facebook/rocksdb/issues/12959 to simplify the contract so that it will always unlink input files without exception. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12980 Test Plan: updated unit test `ExternSSTFileLinkFailFallbackTest.LinkFailFallBackExternalSst` to test that input files will not be unlinked. Reviewed By: pdillinger Differential Revision: D61925111 Pulled By: cbi42 fbshipit-source-id: eadaca72e1ae5288bdd195d57158466e5656fa62
2024-08-27Small CPU optimization in InlineSkipList::Insert() (#12975)Changyu Bi
Summary: reuse decode key in more places to avoid decoding length prefixed key x->Key(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/12975 Test Plan: ran benchmarks simultaneously for "before" and "after" * fillseq: ``` (for I in $(seq 1 50); do ./db_bench --benchmarks=fillseq --disable_auto_compactions=1 --min_write_buffer_number_to_merge=100 --max_write_buffer_number=1000 --write_buffer_size=268435456 --num=5000000 --seed=1723056275 --disable_wal=1 2>&1 | grep "fillseq" done;) | awk '{ t += $5; c++; print } END { printf ("%9.3f\n", 1.0 * t / c) }'; before: 1483191 after: 1490555 (+0.5%) ``` * fillrandom: ``` (for I in $(seq 1 2); do ./db_bench_imain --benchmarks=fillrandom --disable_auto_compactions=1 --min_write_buffer_number_to_merge=100 --max_write_buffer_number=1000 --write_buffer_size=268435456 --num=2500000 --seed=1723056275 --disable_wal=1 2>&1 | grep "fillrandom" before: 255463 after: 256128 (+0.26%) ``` Reviewed By: anand1976 Differential Revision: D61835340 Pulled By: cbi42 fbshipit-source-id: 70345510720e348bacd51269acb5d2dd5a62bf0a
2024-08-27Retain previous trace file in db_stress for debugging purposes (#12978)anand76
Summary: There are several crash test failures due to DB verification failure. Retain some trace history in the expected state directory to make debugging easier. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12978 Reviewed By: cbi42 Differential Revision: D61864921 Pulled By: anand1976 fbshipit-source-id: 9f3f37b7e1e958bc89a3cf0373182354c2c1aa3b
2024-08-26Scope down workflow permissions (#12973)Jay Huh
Summary: Followed instruction per https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#defining-access-for-the-github_token-scopes It turns out that we did not need any of these except `Metadata: read`. Before ``` GITHUB_TOKEN Permissions Actions: write Attestations: write Checks: write Contents: write Deployments: write Discussions: write Issues: write Metadata: read Packages: write Pages: write PullRequests: write RepositoryProjects: write SecurityEvents: write Statuses: write ``` After ``` GITHUB_TOKEN Permissions Metadata: read ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12973 Test Plan: GitHub Actions triggered by this PR Reviewed By: cbi42 Differential Revision: D61812651 Pulled By: jaykorean fbshipit-source-id: 4413756c93f503e8b2fb77eb8b684ef9e6a6c13d
2024-08-26Fix flaky test DBTest2.VariousFileTemperatures (#12974)Peter Dillinger
Summary: ... apparently due to potentially not purging obsolete files after CompactRange Example: https://github.com/facebook/rocksdb/actions/runs/10564621261/job/29267393711?pr=12959 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12974 Test Plan: reproduced failure with USE_CLANG=1 COERCE_CONTEXT_SWITCH=1, now fixed Reviewed By: cbi42 Differential Revision: D61812600 Pulled By: pdillinger fbshipit-source-id: d4b23e1a179bb8ec39875ed7a8ce1649fa3344bd
2024-08-26Support ingesting db generated files using hard link (#12959)Changyu Bi
Summary: so `IngestExternalFileOptions::move_files` and `IngestExternalFileOptions::allow_db_generated_files` are now compatible. The original file links won't be removed if `allow_db_generated_files` is true. This is to prevent deleting files from another DB. There was a [comment](https://github.com/facebook/rocksdb/pull/12750#discussion_r1684509620) in https://github.com/facebook/rocksdb/issues/12750 about how exactly-once ingestion would work with `move_files`. I've discussed with customer and decided that it can be done by reading the target DB to see if it contains any ingested key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12959 Test Plan: updated unit tests `IngestDBGeneratedFileTest*` to enable `move_files`. Reviewed By: jowlyzhang Differential Revision: D61703480 Pulled By: cbi42 fbshipit-source-id: 6b4294369767f989a2f36bbace4ca3c0257aeaf7
2024-08-23Add Temperature to FileAttributes (#12965)Peter Dillinger
Summary: .. so that appropriate implementations can return temperature information from GetChildrenFileAttributes Pull Request resolved: https://github.com/facebook/rocksdb/pull/12965 Test Plan: just an API placeholder for now Reviewed By: anand1976 Differential Revision: D61748199 Pulled By: pdillinger fbshipit-source-id: b457e324cb451e836611a0bf630c3da0f30a8abf
2024-08-23Options for file temperature for more files (#12957)Peter Dillinger
Summary: We have a request to use the cold tier as primary source of truth for the DB, and to best support such use cases and to complement the existing options controlling SST file temperatures, we add two new DB options: * `metadata_write_temperature` for DB "small" files that don't contain much user data * `wal_write_temperature` for WALs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12957 Test Plan: Unit test included, though it's hard to be sure we've covered all the places Reviewed By: jowlyzhang Differential Revision: D61664815 Pulled By: pdillinger fbshipit-source-id: 8e19c9dd8fd2db059bb15f74938d6bc12002e82b
2024-08-23Disable benchmark-linux (#12964)Jay Huh
Summary: Disabling the job temporarily. We will re-enable this when ready again Pull Request resolved: https://github.com/facebook/rocksdb/pull/12964 Reviewed By: ltamasi Differential Revision: D61740941 Pulled By: jaykorean fbshipit-source-id: 167e50c4f5e38d508a8e56633261611467f30690
2024-08-23Fix MultiGet with timestamps (#12943)eniac1024
Summary: Issue: MultiGet(PinnableSlice) can't read out all timestamps. Fixed the impl, and added an UT as well. In the original impl, if MultiGet reads multiple column families, a later column family would clean up timestamps of previous column family. Fix: https://github.com/facebook/rocksdb/issues/12950#issue-2476996580 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12943 Reviewed By: anand1976 Differential Revision: D61729257 Pulled By: pdillinger fbshipit-source-id: 55267c26076c8a59acedd27e14714711729a40df
2024-08-21Record largest seqno in table properties and verify in file ingestion (#12951)Changyu Bi
Summary: this helps to avoid scanning input files when ingesting db generated files: https://github.com/facebook/rocksdb/blob/ecb844babda9e669ff00a5d1ee51eda7430c9bc0/db/external_sst_file_ingestion_job.cc#L917-L935 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12951 Test Plan: * `IngestDBGeneratedFileTest.FailureCase` is updated to verify that this table property is verified during ingestion * existing unit tests for other ingestion use cases. Reviewed By: jowlyzhang Differential Revision: D61608285 Pulled By: cbi42 fbshipit-source-id: b5b7aae9741531349ab247be6ffaa3f3628b76ca
2024-08-21Fix build for macos-arm64-macosx-clang17-no-san (#12949)Jay Huh
Summary: When merged into internal code base we see the following error. This should fix it. ``` Actions failed: [2024-08-20T07:45:53.879-07:00] Action failed: fbcode//rocksdb/src:rocksdb_lib (cfg:macos-arm64-macosx-clang17-no-san#e5847010950663ca) (cxx_compile util/write_batch_util.cc) [2024-08-20T07:45:53.879-07:00] Remote command returned non-zero exit code 1 [2024-08-20T07:45:53.879-07:00] Remote action, reproduce with: `frecli cas download-action 2fe3749f2d3ea6107cce103d4e2be1dcc76a9df797bae308cde5eaccc65201b7:145` fbcode/rocksdb/src/include/rocksdb/write_batch.h:460:14: error: no template named 'unordered_map' in namespace 'std'; did you mean 'unordered_set'? const std::unordered_map<uint32_t, size_t>& GetColumnFamilyToTimestampSize() { ~~~~~^~~~~~~~~~~~~ fbcode/rocksdb/src/include/rocksdb/write_batch.h:540:8: error: no template named 'unordered_map' in namespace 'std'; did you mean 'unordered_set'? std::unordered_map<uint32_t, size_t> cf_id_to_ts_sz_; ~~~~~^~~~~~~~~~~~~ /paragon/pods/259551525/home/execution/3/202ac945754041b6bc424b0c35e42c9d/work/buck-out/v2/gen/fbsource/a90614bbe22ec1d7/xplat/toolchains/minimal_xcode/__clang_genrule__/out/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__memory/compressed_pair.h:113:3: error: static_assert failed due to requirement '!is_same<unsigned long, unsigned long>::value' "__compressed_pair cannot be instantiated when T1 and T2 are the same type; The current implementation is NOT ABI-compatible with the previous implementation for this configuration" static_assert((!is_same<_T1, _T2>::value), ^ ~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12949 Test Plan: CI Reviewed By: jowlyzhang, cbi42 Differential Revision: D61577604 Pulled By: jaykorean fbshipit-source-id: 3584a2cd550a303346d80ccc5cc90f4a9b3e2da2
2024-08-21Add some documentation for version edit handlers (#12948)Yu Zhang
Summary: As titled. No functional change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12948 Reviewed By: hx235 Differential Revision: D61551254 Pulled By: jowlyzhang fbshipit-source-id: ccf53d78bd2f18f174d7e61972e5de467c96ce76
2024-08-20Enable Continuous Benchmarking of RocksDB (#12885)Radek Hubner
Summary: This pull request transitions the benchmarking process from CircleCI to GitHub Actions. The benchmarking jobs will now be executed on a self-hosted runner. Unlike the previous CircleCI configuration, where jobs were queued due to the long execution time (nearly 60 minutes per job), the new setup schedules the benchmarking tasks to run every two hours. Closes https://github.com/facebook/rocksdb/issues/12615 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12885 Reviewed By: pdillinger Differential Revision: D61422468 Pulled By: jaykorean fbshipit-source-id: 10535865c849797825f9652e4e9ef367b3d73599