diff options
author | Igor Canadi <icanadi@fb.com> | 2014-05-10 10:49:33 -0700 |
---|---|---|
committer | Igor Canadi <icanadi@fb.com> | 2014-05-10 10:49:33 -0700 |
commit | 038a477b5390aeac3491c1a8743417c66917b184 (patch) | |
tree | ee251527e8ccb9b382669019a5aa32f1636e0e15 | |
parent | acd17fd002c26f4184353c3faea11d5424c41470 (diff) |
Make it easier to start using RocksDB
Summary:
This diff is addressing multiple things with a single goal -- to make RocksDB easier to use:
* Add some functions to Options that make RocksDB easier to tune.
* Add example code for both simple RocksDB and RocksDB with Column Families.
* Rewrite our README.md
Regarding Options, I took a stab at something we talked about for a long time:
* https://www.facebook.com/groups/rocksdb.dev/permalink/563169950448190/
I added functions:
* IncreaseParallelism() -- easy, increases the thread pool and max_background_compactions
* OptimizeLevelStyleCompaction(memtable_memory_budget) -- the easiest way to optimize rocksdb for less stalls with level style compaction. This is very likely not ideal configuration. Feel free to suggest improvements. I used some of Mark's suggestions from here: https://github.com/facebook/rocksdb/issues/54
* OptimizeUniversalStyleCompaction(memtable_memory_budget) -- optimize for universal compaction.
Test Plan: compiled rocksdb. ran examples.
Reviewers: dhruba, MarkCallaghan, haobo, sdong, yhchiang
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18621
-rw-r--r-- | README | 82 | ||||
-rw-r--r-- | README.md | 24 | ||||
-rw-r--r-- | examples/.gitignore | 2 | ||||
-rw-r--r-- | examples/Makefile | 9 | ||||
-rw-r--r-- | examples/README.md | 1 | ||||
-rw-r--r-- | examples/column_families_example.cc | 72 | ||||
-rw-r--r-- | examples/simple_example.cc | 41 | ||||
-rw-r--r-- | include/rocksdb/options.h | 33 | ||||
-rw-r--r-- | util/options.cc | 64 |
9 files changed, 246 insertions, 82 deletions
diff --git a/README b/README deleted file mode 100644 index 473e4145b..000000000 --- a/README +++ /dev/null @@ -1,82 +0,0 @@ -rocksdb: A persistent key-value store for flash storage -Authors: * The Facebook Database Engineering Team - * Build on earlier work on leveldb by Sanjay Ghemawat - (sanjay@google.com) and Jeff Dean (jeff@google.com) - -This code is a library that forms the core building block for a fast -key value server, especially suited for storing data on flash drives. -It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs -between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF) -and Space-Amplification-Factor(SAF). It has multi-threaded compactions, -making it specially suitable for storing multiple terabytes of data in a -single database. - -The core of this code has been derived from open-source leveldb. - -The code under this directory implements a system for maintaining a -persistent key/value store. - -See doc/index.html and github wiki (https://github.com/facebook/rocksdb/wiki) -for more explanation. - -The public interface is in include/*. Callers should not include or -rely on the details of any other header files in this package. Those -internal APIs may be changed without warning. - -Guide to header files: - -include/rocksdb/db.h - Main interface to the DB: Start here - -include/rocksdb/options.h - Control over the behavior of an entire database, and also - control over the behavior of individual reads and writes. - -include/rocksdb/comparator.h - Abstraction for user-specified comparison function. If you want - just bytewise comparison of keys, you can use the default comparator, - but clients can write their own comparator implementations if they - want custom ordering (e.g. to handle different character - encodings, etc.) - -include/rocksdb/iterator.h - Interface for iterating over data. You can get an iterator - from a DB object. - -include/rocksdb/write_batch.h - Interface for atomically applying multiple updates to a database. - -include/rocksdb/slice.h - A simple module for maintaining a pointer and a length into some - other byte array. - -include/rocksdb/status.h - Status is returned from many of the public interfaces and is used - to report success and various kinds of errors. - -include/rocksdb/env.h - Abstraction of the OS environment. A posix implementation of - this interface is in util/env_posix.cc - -include/rocksdb/table_builder.h - Lower-level modules that most clients probably won't use directly - -include/rocksdb/cache.h - An API for the block cache. - -include/rocksdb/compaction_filter.h - An API for a application filter invoked on every compaction. - -include/rocksdb/filter_policy.h - An API for configuring a bloom filter. - -include/rocksdb/memtablerep.h - An API for implementing a memtable. - -include/rocksdb/statistics.h - An API to retrieve various database statistics. - -include/rocksdb/transaction_log.h - An API to retrieve transaction logs from a database. - -Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ diff --git a/README.md b/README.md new file mode 100644 index 000000000..b5a17f39a --- /dev/null +++ b/README.md @@ -0,0 +1,24 @@ +## RocksDB: A Persistent Key-Value Store for Flash and RAM Storage + +RocksDB is developed and maintained by Facebook Database Engineering Team. +It is built on on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) +and Jeff Dean (jeff@google.com) + +This code is a library that forms the core building block for a fast +key value server, especially suited for storing data on flash drives. +It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs +between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) +and Space-Amplification-Factor (SAF). It has multi-threaded compactions, +making it specially suitable for storing multiple terabytes of data in a +single database. + +Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples + +See [doc/index.html](https://github.com/facebook/rocksdb/blob/master/doc/index.html) and +[github wiki](https://github.com/facebook/rocksdb/wiki) for more explanation. + +The public interface is in `include/`. Callers should not include or +rely on the details of any other header files in this package. Those +internal APIs may be changed without warning. + +Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ diff --git a/examples/.gitignore b/examples/.gitignore new file mode 100644 index 000000000..d3c22099a --- /dev/null +++ b/examples/.gitignore @@ -0,0 +1,2 @@ +column_families_example +simple_example diff --git a/examples/Makefile b/examples/Makefile new file mode 100644 index 000000000..2567fdf86 --- /dev/null +++ b/examples/Makefile @@ -0,0 +1,9 @@ +include ../build_config.mk + +all: simple_example column_families_example + +simple_example: simple_example.cc + $(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS) + +column_families_example: column_families_example.cc + $(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS) diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 000000000..b07b3903a --- /dev/null +++ b/examples/README.md @@ -0,0 +1 @@ +Compile RocksDB first by executing `make static_lib` in parent dir diff --git a/examples/column_families_example.cc b/examples/column_families_example.cc new file mode 100644 index 000000000..2bdf6ec42 --- /dev/null +++ b/examples/column_families_example.cc @@ -0,0 +1,72 @@ +// Copyright (c) 2013, Facebook, Inc. All rights reserved. +// This source code is licensed under the BSD-style license found in the +// LICENSE file in the root directory of this source tree. An additional grant +// of patent rights can be found in the PATENTS file in the same directory. +#include <cstdio> +#include <string> +#include <vector> + +#include "rocksdb/db.h" +#include "rocksdb/slice.h" +#include "rocksdb/options.h" + +using namespace rocksdb; + +std::string kDBPath = "/tmp/rocksdb_column_families_example"; + +int main() { + // open DB + Options options; + options.create_if_missing = true; + DB* db; + Status s = DB::Open(options, kDBPath, &db); + assert(s.ok()); + + // create column family + ColumnFamilyHandle* cf; + s = db->CreateColumnFamily(ColumnFamilyOptions(), "new_cf", &cf); + assert(s.ok()); + + // close DB + delete cf; + delete db; + + // open DB with two column families + std::vector<ColumnFamilyDescriptor> column_families; + // have to open default column familiy + column_families.push_back(ColumnFamilyDescriptor( + kDefaultColumnFamilyName, ColumnFamilyOptions())); + // open the new one, too + column_families.push_back(ColumnFamilyDescriptor( + "new_cf", ColumnFamilyOptions())); + std::vector<ColumnFamilyHandle*> handles; + s = DB::Open(DBOptions(), kDBPath, column_families, &handles, &db); + assert(s.ok()); + + // put and get from non-default column family + s = db->Put(WriteOptions(), handles[1], Slice("key"), Slice("value")); + assert(s.ok()); + std::string value; + s = db->Get(ReadOptions(), handles[1], Slice("key"), &value); + assert(s.ok()); + + // atomic write + WriteBatch batch; + batch.Put(handles[0], Slice("key2"), Slice("value2")); + batch.Put(handles[1], Slice("key3"), Slice("value3")); + batch.Delete(handles[0], Slice("key")); + s = db->Write(WriteOptions(), &batch); + assert(s.ok()); + + // drop column family + s = db->DropColumnFamily(handles[1]); + assert(s.ok()); + + // close db + for (auto handle : handles) { + delete handle; + } + delete db; + + return 0; +} diff --git a/examples/simple_example.cc b/examples/simple_example.cc new file mode 100644 index 000000000..20e7faa4b --- /dev/null +++ b/examples/simple_example.cc @@ -0,0 +1,41 @@ +// Copyright (c) 2013, Facebook, Inc. All rights reserved. +// This source code is licensed under the BSD-style license found in the +// LICENSE file in the root directory of this source tree. An additional grant +// of patent rights can be found in the PATENTS file in the same directory. +#include <cstdio> +#include <string> + +#include "rocksdb/db.h" +#include "rocksdb/slice.h" +#include "rocksdb/options.h" + +using namespace rocksdb; + +std::string kDBPath = "/tmp/rocksdb_simple_example"; + +int main() { + DB* db; + Options options; + // Optimize RocksDB. This is the easiest way to get RocksDB to perform well + options.IncreaseParallelism(); + options.OptimizeLevelStyleCompaction(); + // create the DB if it's not already present + options.create_if_missing = true; + + // open DB + Status s = DB::Open(options, kDBPath, &db); + assert(s.ok()); + + // Put key-value + s = db->Put(WriteOptions(), "key", "value"); + assert(s.ok()); + std::string value; + // get value + s = db->Get(ReadOptions(), "key", &value); + assert(s.ok()); + assert(value == "value"); + + delete db; + + return 0; +} diff --git a/include/rocksdb/options.h b/include/rocksdb/options.h index 93dbf0d88..d48cdcc97 100644 --- a/include/rocksdb/options.h +++ b/include/rocksdb/options.h @@ -76,6 +76,29 @@ enum UpdateStatus { // Return status For inplace update callback struct Options; struct ColumnFamilyOptions { + // Some functions that make it easier to optimize RocksDB + + // Use this if you don't need to keep the data sorted, i.e. you'll never use + // an iterator, only Put() and Get() API calls + ColumnFamilyOptions* OptimizeForPointLookup(); + + // Default values for some parameters in ColumnFamilyOptions are not + // optimized for heavy workloads and big datasets, which means you might + // observe write stalls under some conditions. As a starting point for tuning + // RocksDB options, use the following two functions: + // * OptimizeLevelStyleCompaction -- optimizes level style compaction + // * OptimizeUniversalStyleCompaction -- optimizes universal style compaction + // Universal style compaction is focused on reducing Write Amplification + // Factor for big data sets, but increases Space Amplification. You can learn + // more about the different styles here: + // https://github.com/facebook/rocksdb/wiki/Rocksdb-Architecture-Guide + // Note: we might use more memory than memtable_memory_budget during high + // write rate period + ColumnFamilyOptions* OptimizeLevelStyleCompaction( + uint64_t memtable_memory_budget = 512 * 1024 * 1024); + ColumnFamilyOptions* OptimizeUniversalStyleCompaction( + uint64_t memtable_memory_budget = 512 * 1024 * 1024); + // ------------------- // Parameters that affect behavior @@ -336,6 +359,7 @@ struct ColumnFamilyOptions { // With bloomfilter and fast storage, a miss on one level // is very cheap if the file handle is cached in table cache // (which is true if max_open_files is large). + // Default: true bool disable_seek_compaction; // Puts are delayed 0-1 ms when any level has a compaction score that exceeds @@ -546,6 +570,15 @@ struct ColumnFamilyOptions { }; struct DBOptions { + // Some functions that make it easier to optimize RocksDB + + // By default, RocksDB uses only one background thread for flush and + // compaction. Calling this function will set it up such that total of + // `total_threads` is used. Good value for `total_threads` is the number of + // cores. You almost definitely want to call this function if your system is + // bottlenecked by RocksDB. + DBOptions* IncreaseParallelism(int total_threads = 16); + // If true, the database will be created if it is missing. // Default: false bool create_if_missing; diff --git a/util/options.cc b/util/options.cc index c8d1e3889..cc9571890 100644 --- a/util/options.cc +++ b/util/options.cc @@ -480,4 +480,68 @@ Options::PrepareForBulkLoad() return this; } +// Optimization functions +ColumnFamilyOptions* ColumnFamilyOptions::OptimizeForPointLookup() { + prefix_extractor.reset(NewNoopTransform()); + BlockBasedTableOptions block_based_options; + block_based_options.index_type = BlockBasedTableOptions::kBinarySearch; + table_factory.reset(new BlockBasedTableFactory(block_based_options)); + memtable_factory.reset(NewHashLinkListRepFactory()); + return this; +} + +ColumnFamilyOptions* ColumnFamilyOptions::OptimizeLevelStyleCompaction( + uint64_t memtable_memory_budget) { + write_buffer_size = memtable_memory_budget / 4; + // merge two memtables when flushing to L0 + min_write_buffer_number_to_merge = 2; + // this means we'll use 50% extra memory in the worst case, but will reduce + // write stalls. + max_write_buffer_number = 6; + // start flushing L0->L1 as soon as possible. each file on level0 is + // (memtable_memory_budget / 2). This will flush level 0 when it's bigger than + // memtable_memory_budget. + level0_file_num_compaction_trigger = 2; + // doesn't really matter much, but we don't want to create too many files + target_file_size_base = memtable_memory_budget / 8; + // make Level1 size equal to Level0 size, so that L0->L1 compactions are fast + max_bytes_for_level_base = memtable_memory_budget; + + // level style compaction + compaction_style = kCompactionStyleLevel; + + // only compress levels >= 2 + compression_per_level.resize(num_levels); + for (int i = 0; i < num_levels; ++i) { + if (i < 2) { + compression_per_level[i] = kNoCompression; + } else { + compression_per_level[i] = kSnappyCompression; + } + } + return this; +} + +ColumnFamilyOptions* ColumnFamilyOptions::OptimizeUniversalStyleCompaction( + uint64_t memtable_memory_budget) { + write_buffer_size = memtable_memory_budget / 4; + // merge two memtables when flushing to L0 + min_write_buffer_number_to_merge = 2; + // this means we'll use 50% extra memory in the worst case, but will reduce + // write stalls. + max_write_buffer_number = 6; + // universal style compaction + compaction_style = kCompactionStyleUniversal; + compaction_options_universal.compression_size_percent = 80; + return this; +} + +DBOptions* DBOptions::IncreaseParallelism(int total_threads) { + max_background_compactions = total_threads - 1; + max_background_flushes = 1; + env->SetBackgroundThreads(total_threads, Env::LOW); + env->SetBackgroundThreads(1, Env::HIGH); + return this; +} + } // namespace rocksdb |