summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorIgor Canadi <icanadi@fb.com>2014-05-10 10:49:33 -0700
committerIgor Canadi <icanadi@fb.com>2014-05-10 10:49:33 -0700
commit038a477b5390aeac3491c1a8743417c66917b184 (patch)
treeee251527e8ccb9b382669019a5aa32f1636e0e15
parentacd17fd002c26f4184353c3faea11d5424c41470 (diff)
Make it easier to start using RocksDB
Summary: This diff is addressing multiple things with a single goal -- to make RocksDB easier to use: * Add some functions to Options that make RocksDB easier to tune. * Add example code for both simple RocksDB and RocksDB with Column Families. * Rewrite our README.md Regarding Options, I took a stab at something we talked about for a long time: * https://www.facebook.com/groups/rocksdb.dev/permalink/563169950448190/ I added functions: * IncreaseParallelism() -- easy, increases the thread pool and max_background_compactions * OptimizeLevelStyleCompaction(memtable_memory_budget) -- the easiest way to optimize rocksdb for less stalls with level style compaction. This is very likely not ideal configuration. Feel free to suggest improvements. I used some of Mark's suggestions from here: https://github.com/facebook/rocksdb/issues/54 * OptimizeUniversalStyleCompaction(memtable_memory_budget) -- optimize for universal compaction. Test Plan: compiled rocksdb. ran examples. Reviewers: dhruba, MarkCallaghan, haobo, sdong, yhchiang Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D18621
-rw-r--r--README82
-rw-r--r--README.md24
-rw-r--r--examples/.gitignore2
-rw-r--r--examples/Makefile9
-rw-r--r--examples/README.md1
-rw-r--r--examples/column_families_example.cc72
-rw-r--r--examples/simple_example.cc41
-rw-r--r--include/rocksdb/options.h33
-rw-r--r--util/options.cc64
9 files changed, 246 insertions, 82 deletions
diff --git a/README b/README
deleted file mode 100644
index 473e4145b..000000000
--- a/README
+++ /dev/null
@@ -1,82 +0,0 @@
-rocksdb: A persistent key-value store for flash storage
-Authors: * The Facebook Database Engineering Team
- * Build on earlier work on leveldb by Sanjay Ghemawat
- (sanjay@google.com) and Jeff Dean (jeff@google.com)
-
-This code is a library that forms the core building block for a fast
-key value server, especially suited for storing data on flash drives.
-It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
-between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
-and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
-making it specially suitable for storing multiple terabytes of data in a
-single database.
-
-The core of this code has been derived from open-source leveldb.
-
-The code under this directory implements a system for maintaining a
-persistent key/value store.
-
-See doc/index.html and github wiki (https://github.com/facebook/rocksdb/wiki)
-for more explanation.
-
-The public interface is in include/*. Callers should not include or
-rely on the details of any other header files in this package. Those
-internal APIs may be changed without warning.
-
-Guide to header files:
-
-include/rocksdb/db.h
- Main interface to the DB: Start here
-
-include/rocksdb/options.h
- Control over the behavior of an entire database, and also
- control over the behavior of individual reads and writes.
-
-include/rocksdb/comparator.h
- Abstraction for user-specified comparison function. If you want
- just bytewise comparison of keys, you can use the default comparator,
- but clients can write their own comparator implementations if they
- want custom ordering (e.g. to handle different character
- encodings, etc.)
-
-include/rocksdb/iterator.h
- Interface for iterating over data. You can get an iterator
- from a DB object.
-
-include/rocksdb/write_batch.h
- Interface for atomically applying multiple updates to a database.
-
-include/rocksdb/slice.h
- A simple module for maintaining a pointer and a length into some
- other byte array.
-
-include/rocksdb/status.h
- Status is returned from many of the public interfaces and is used
- to report success and various kinds of errors.
-
-include/rocksdb/env.h
- Abstraction of the OS environment. A posix implementation of
- this interface is in util/env_posix.cc
-
-include/rocksdb/table_builder.h
- Lower-level modules that most clients probably won't use directly
-
-include/rocksdb/cache.h
- An API for the block cache.
-
-include/rocksdb/compaction_filter.h
- An API for a application filter invoked on every compaction.
-
-include/rocksdb/filter_policy.h
- An API for configuring a bloom filter.
-
-include/rocksdb/memtablerep.h
- An API for implementing a memtable.
-
-include/rocksdb/statistics.h
- An API to retrieve various database statistics.
-
-include/rocksdb/transaction_log.h
- An API to retrieve transaction logs from a database.
-
-Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/
diff --git a/README.md b/README.md
new file mode 100644
index 000000000..b5a17f39a
--- /dev/null
+++ b/README.md
@@ -0,0 +1,24 @@
+## RocksDB: A Persistent Key-Value Store for Flash and RAM Storage
+
+RocksDB is developed and maintained by Facebook Database Engineering Team.
+It is built on on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com)
+and Jeff Dean (jeff@google.com)
+
+This code is a library that forms the core building block for a fast
+key value server, especially suited for storing data on flash drives.
+It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
+between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF)
+and Space-Amplification-Factor (SAF). It has multi-threaded compactions,
+making it specially suitable for storing multiple terabytes of data in a
+single database.
+
+Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples
+
+See [doc/index.html](https://github.com/facebook/rocksdb/blob/master/doc/index.html) and
+[github wiki](https://github.com/facebook/rocksdb/wiki) for more explanation.
+
+The public interface is in `include/`. Callers should not include or
+rely on the details of any other header files in this package. Those
+internal APIs may be changed without warning.
+
+Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/
diff --git a/examples/.gitignore b/examples/.gitignore
new file mode 100644
index 000000000..d3c22099a
--- /dev/null
+++ b/examples/.gitignore
@@ -0,0 +1,2 @@
+column_families_example
+simple_example
diff --git a/examples/Makefile b/examples/Makefile
new file mode 100644
index 000000000..2567fdf86
--- /dev/null
+++ b/examples/Makefile
@@ -0,0 +1,9 @@
+include ../build_config.mk
+
+all: simple_example column_families_example
+
+simple_example: simple_example.cc
+ $(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS)
+
+column_families_example: column_families_example.cc
+ $(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS)
diff --git a/examples/README.md b/examples/README.md
new file mode 100644
index 000000000..b07b3903a
--- /dev/null
+++ b/examples/README.md
@@ -0,0 +1 @@
+Compile RocksDB first by executing `make static_lib` in parent dir
diff --git a/examples/column_families_example.cc b/examples/column_families_example.cc
new file mode 100644
index 000000000..2bdf6ec42
--- /dev/null
+++ b/examples/column_families_example.cc
@@ -0,0 +1,72 @@
+// Copyright (c) 2013, Facebook, Inc. All rights reserved.
+// This source code is licensed under the BSD-style license found in the
+// LICENSE file in the root directory of this source tree. An additional grant
+// of patent rights can be found in the PATENTS file in the same directory.
+#include <cstdio>
+#include <string>
+#include <vector>
+
+#include "rocksdb/db.h"
+#include "rocksdb/slice.h"
+#include "rocksdb/options.h"
+
+using namespace rocksdb;
+
+std::string kDBPath = "/tmp/rocksdb_column_families_example";
+
+int main() {
+ // open DB
+ Options options;
+ options.create_if_missing = true;
+ DB* db;
+ Status s = DB::Open(options, kDBPath, &db);
+ assert(s.ok());
+
+ // create column family
+ ColumnFamilyHandle* cf;
+ s = db->CreateColumnFamily(ColumnFamilyOptions(), "new_cf", &cf);
+ assert(s.ok());
+
+ // close DB
+ delete cf;
+ delete db;
+
+ // open DB with two column families
+ std::vector<ColumnFamilyDescriptor> column_families;
+ // have to open default column familiy
+ column_families.push_back(ColumnFamilyDescriptor(
+ kDefaultColumnFamilyName, ColumnFamilyOptions()));
+ // open the new one, too
+ column_families.push_back(ColumnFamilyDescriptor(
+ "new_cf", ColumnFamilyOptions()));
+ std::vector<ColumnFamilyHandle*> handles;
+ s = DB::Open(DBOptions(), kDBPath, column_families, &handles, &db);
+ assert(s.ok());
+
+ // put and get from non-default column family
+ s = db->Put(WriteOptions(), handles[1], Slice("key"), Slice("value"));
+ assert(s.ok());
+ std::string value;
+ s = db->Get(ReadOptions(), handles[1], Slice("key"), &value);
+ assert(s.ok());
+
+ // atomic write
+ WriteBatch batch;
+ batch.Put(handles[0], Slice("key2"), Slice("value2"));
+ batch.Put(handles[1], Slice("key3"), Slice("value3"));
+ batch.Delete(handles[0], Slice("key"));
+ s = db->Write(WriteOptions(), &batch);
+ assert(s.ok());
+
+ // drop column family
+ s = db->DropColumnFamily(handles[1]);
+ assert(s.ok());
+
+ // close db
+ for (auto handle : handles) {
+ delete handle;
+ }
+ delete db;
+
+ return 0;
+}
diff --git a/examples/simple_example.cc b/examples/simple_example.cc
new file mode 100644
index 000000000..20e7faa4b
--- /dev/null
+++ b/examples/simple_example.cc
@@ -0,0 +1,41 @@
+// Copyright (c) 2013, Facebook, Inc. All rights reserved.
+// This source code is licensed under the BSD-style license found in the
+// LICENSE file in the root directory of this source tree. An additional grant
+// of patent rights can be found in the PATENTS file in the same directory.
+#include <cstdio>
+#include <string>
+
+#include "rocksdb/db.h"
+#include "rocksdb/slice.h"
+#include "rocksdb/options.h"
+
+using namespace rocksdb;
+
+std::string kDBPath = "/tmp/rocksdb_simple_example";
+
+int main() {
+ DB* db;
+ Options options;
+ // Optimize RocksDB. This is the easiest way to get RocksDB to perform well
+ options.IncreaseParallelism();
+ options.OptimizeLevelStyleCompaction();
+ // create the DB if it's not already present
+ options.create_if_missing = true;
+
+ // open DB
+ Status s = DB::Open(options, kDBPath, &db);
+ assert(s.ok());
+
+ // Put key-value
+ s = db->Put(WriteOptions(), "key", "value");
+ assert(s.ok());
+ std::string value;
+ // get value
+ s = db->Get(ReadOptions(), "key", &value);
+ assert(s.ok());
+ assert(value == "value");
+
+ delete db;
+
+ return 0;
+}
diff --git a/include/rocksdb/options.h b/include/rocksdb/options.h
index 93dbf0d88..d48cdcc97 100644
--- a/include/rocksdb/options.h
+++ b/include/rocksdb/options.h
@@ -76,6 +76,29 @@ enum UpdateStatus { // Return status For inplace update callback
struct Options;
struct ColumnFamilyOptions {
+ // Some functions that make it easier to optimize RocksDB
+
+ // Use this if you don't need to keep the data sorted, i.e. you'll never use
+ // an iterator, only Put() and Get() API calls
+ ColumnFamilyOptions* OptimizeForPointLookup();
+
+ // Default values for some parameters in ColumnFamilyOptions are not
+ // optimized for heavy workloads and big datasets, which means you might
+ // observe write stalls under some conditions. As a starting point for tuning
+ // RocksDB options, use the following two functions:
+ // * OptimizeLevelStyleCompaction -- optimizes level style compaction
+ // * OptimizeUniversalStyleCompaction -- optimizes universal style compaction
+ // Universal style compaction is focused on reducing Write Amplification
+ // Factor for big data sets, but increases Space Amplification. You can learn
+ // more about the different styles here:
+ // https://github.com/facebook/rocksdb/wiki/Rocksdb-Architecture-Guide
+ // Note: we might use more memory than memtable_memory_budget during high
+ // write rate period
+ ColumnFamilyOptions* OptimizeLevelStyleCompaction(
+ uint64_t memtable_memory_budget = 512 * 1024 * 1024);
+ ColumnFamilyOptions* OptimizeUniversalStyleCompaction(
+ uint64_t memtable_memory_budget = 512 * 1024 * 1024);
+
// -------------------
// Parameters that affect behavior
@@ -336,6 +359,7 @@ struct ColumnFamilyOptions {
// With bloomfilter and fast storage, a miss on one level
// is very cheap if the file handle is cached in table cache
// (which is true if max_open_files is large).
+ // Default: true
bool disable_seek_compaction;
// Puts are delayed 0-1 ms when any level has a compaction score that exceeds
@@ -546,6 +570,15 @@ struct ColumnFamilyOptions {
};
struct DBOptions {
+ // Some functions that make it easier to optimize RocksDB
+
+ // By default, RocksDB uses only one background thread for flush and
+ // compaction. Calling this function will set it up such that total of
+ // `total_threads` is used. Good value for `total_threads` is the number of
+ // cores. You almost definitely want to call this function if your system is
+ // bottlenecked by RocksDB.
+ DBOptions* IncreaseParallelism(int total_threads = 16);
+
// If true, the database will be created if it is missing.
// Default: false
bool create_if_missing;
diff --git a/util/options.cc b/util/options.cc
index c8d1e3889..cc9571890 100644
--- a/util/options.cc
+++ b/util/options.cc
@@ -480,4 +480,68 @@ Options::PrepareForBulkLoad()
return this;
}
+// Optimization functions
+ColumnFamilyOptions* ColumnFamilyOptions::OptimizeForPointLookup() {
+ prefix_extractor.reset(NewNoopTransform());
+ BlockBasedTableOptions block_based_options;
+ block_based_options.index_type = BlockBasedTableOptions::kBinarySearch;
+ table_factory.reset(new BlockBasedTableFactory(block_based_options));
+ memtable_factory.reset(NewHashLinkListRepFactory());
+ return this;
+}
+
+ColumnFamilyOptions* ColumnFamilyOptions::OptimizeLevelStyleCompaction(
+ uint64_t memtable_memory_budget) {
+ write_buffer_size = memtable_memory_budget / 4;
+ // merge two memtables when flushing to L0
+ min_write_buffer_number_to_merge = 2;
+ // this means we'll use 50% extra memory in the worst case, but will reduce
+ // write stalls.
+ max_write_buffer_number = 6;
+ // start flushing L0->L1 as soon as possible. each file on level0 is
+ // (memtable_memory_budget / 2). This will flush level 0 when it's bigger than
+ // memtable_memory_budget.
+ level0_file_num_compaction_trigger = 2;
+ // doesn't really matter much, but we don't want to create too many files
+ target_file_size_base = memtable_memory_budget / 8;
+ // make Level1 size equal to Level0 size, so that L0->L1 compactions are fast
+ max_bytes_for_level_base = memtable_memory_budget;
+
+ // level style compaction
+ compaction_style = kCompactionStyleLevel;
+
+ // only compress levels >= 2
+ compression_per_level.resize(num_levels);
+ for (int i = 0; i < num_levels; ++i) {
+ if (i < 2) {
+ compression_per_level[i] = kNoCompression;
+ } else {
+ compression_per_level[i] = kSnappyCompression;
+ }
+ }
+ return this;
+}
+
+ColumnFamilyOptions* ColumnFamilyOptions::OptimizeUniversalStyleCompaction(
+ uint64_t memtable_memory_budget) {
+ write_buffer_size = memtable_memory_budget / 4;
+ // merge two memtables when flushing to L0
+ min_write_buffer_number_to_merge = 2;
+ // this means we'll use 50% extra memory in the worst case, but will reduce
+ // write stalls.
+ max_write_buffer_number = 6;
+ // universal style compaction
+ compaction_style = kCompactionStyleUniversal;
+ compaction_options_universal.compression_size_percent = 80;
+ return this;
+}
+
+DBOptions* DBOptions::IncreaseParallelism(int total_threads) {
+ max_background_compactions = total_threads - 1;
+ max_background_flushes = 1;
+ env->SetBackgroundThreads(total_threads, Env::LOW);
+ env->SetBackgroundThreads(1, Env::HIGH);
+ return this;
+}
+
} // namespace rocksdb