summaryrefslogtreecommitdiff
path: root/CMakeLists.txt
diff options
context:
space:
mode:
authorPeter Dillinger <peterd@fb.com>2023-05-09 22:25:45 -0700
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>2023-05-09 22:25:45 -0700
commit459969e9936749e1a016d7c2d05abaa11df004db (patch)
tree62fc856d9b5885ec7bb3e0e3d6025b31d9f065a2 /CMakeLists.txt
parentf4a02f2c5260d435b9bc94bb7f2aa93711c64ae0 (diff)
Simplify detection of x86 CPU features (#11419)
Summary: **Background** - runtime detection of certain x86 CPU features was added for optimizing CRC32c checksums, where performance is dramatically affected by the availability of certain CPU instructions and code using intrinsics for those instructions. And Java builds with native library try to be broadly compatible but performant. What has changed is that CRC32c is no longer the most efficient cheecksum on contemporary x86_64 hardware, nor the default checksum. XXH3 is generally faster and not as dramatically impacted by the availability of certain CPU instructions. For example, on my Skylake system using db_bench (similar on an older Skylake system without AVX512): PORTABLE=1 empty USE_SSE : xxh3->8 GB/s crc32c->0.8 GB/s (no SSE4.2 nor AVX2 instructions) PORTABLE=1 USE_SSE=1 : xxh3->19 GB/s crc32c->16 GB/s (with SSE4.2 and AVX2) PORTABLE=0 USE_SSE ignored: xxh3->28 GB/s crc32c->16 GB/s (also some AVX512) Testing a ~10 year old system, with SSE4.2 but without AVX2, crc32c is a similar speed to the new systems but xxh3 is only about half that speed, also 8GB/s like the non-AVX2 compile above. Given that xxh3 has specific optimization for AVX2, I think we can infer that that crc32c is only fastest for that ~2008-2013 period when SSE4.2 was included but not AVX2. And given that xxh3 is only about 2x slower on these systems (not like >10x slower for unoptimized crc32c), I don't think we need to invest too much in optimally adapting to these old cases. x86 hardware that doesn't support fast CRC32c is now extremely rare, so requiring a custom build to support such hardware is fine IMHO. **This change** does two related things: * Remove runtime CPU detection for optimizing CRC32c on x86. Maintaining this code is non-zero work, and compiling special code that doesn't work on the configured target instruction set for code generation is always dubious. (On the one hand we have to ensure the CRC32c code uses SSE4.2 but on the other hand we have to ensure nothing else does.) * Detect CPU features in source code, not in build scripts. Although there are some hypothetical advantages to detectiong in build scripts (compiler generality), RocksDB supports at least three build systems: make, cmake, and buck. It's not practical to support feature detection on all three, and we have suffered from missed optimization opportunities by relying on missing or incomplete detection in cmake and buck. We also depend on some components like xxhash that do source code detection anyway. **In more detail:** * `HAVE_SSE42`, `HAVE_AVX2`, and `HAVE_PCLMUL` replaced by standard macros `__SSE4_2__`, `__AVX2__`, and `__PCLMUL__`. * MSVC does not provide high fidelity defines for SSE, PCLMUL, or POPCNT, but we can infer those from `__AVX__` or `__AVX2__` in a compatibility header. In rare cases of false negative or false positive feature detection, a build engineer should be able to set defines to work around the issue. * `__POPCNT__` is another standard define, but we happen to only need it on MSVC, where it is set by that compatibility header, or can be set by the build engineer. * `PORTABLE` can be set to a CPU type, e.g. "haswell", to compile for that CPU type. * `USE_SSE` is deprecated, now equivalent to PORTABLE=haswell, which roughly approximates its old behavior. Notably, this change should enable more builds to use the AVX2-optimized Bloom filter implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11419 Test Plan: existing tests, CI Manual performance tests after the change match the before above (none expected with make build). We also see AVX2 optimized Bloom filter code enabled when expected, by injecting a compiler error. (Performance difference is not big on my current CPU.) Reviewed By: ajkr Differential Revision: D45489041 Pulled By: pdillinger fbshipit-source-id: 60ceb0dd2aa3b365c99ed08a8b2a087a9abb6a70
Diffstat (limited to 'CMakeLists.txt')
-rw-r--r--CMakeLists.txt67
1 files changed, 15 insertions, 52 deletions
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 598c72815..109981c1b 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -253,33 +253,10 @@ if(CMAKE_SYSTEM_PROCESSOR MATCHES "loongarch64")
endif(HAS_LOONGARCH64)
endif(CMAKE_SYSTEM_PROCESSOR MATCHES "loongarch64")
-option(PORTABLE "build a portable binary" OFF)
-option(FORCE_SSE42 "force building with SSE4.2, even when PORTABLE=ON" OFF)
-option(FORCE_AVX "force building with AVX, even when PORTABLE=ON" OFF)
-option(FORCE_AVX2 "force building with AVX2, even when PORTABLE=ON" OFF)
-if(PORTABLE)
- add_definitions(-DROCKSDB_PORTABLE)
-
- # MSVC does not need a separate compiler flag to enable SSE4.2; if nmmintrin.h
- # is available, it is available by default.
- if(FORCE_SSE42 AND NOT MSVC)
- set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.2 -mpclmul")
- endif()
- if(MSVC)
- if(FORCE_AVX)
- set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX")
- endif()
- # MSVC automatically enables BMI / lzcnt with AVX2.
- if(FORCE_AVX2)
- set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX2")
- endif()
- else()
- if(FORCE_AVX)
- set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx")
- endif()
- if(FORCE_AVX2)
- set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx2 -mbmi -mlzcnt")
- endif()
+set(PORTABLE 0 CACHE STRING "Minimum CPU arch to support, or 0 = current CPU, 1 = baseline CPU")
+if(PORTABLE STREQUAL 1)
+ # Usually nothing to do; compiler default is typically the most general
+ if(NOT MSVC)
if(CMAKE_SYSTEM_PROCESSOR MATCHES "^s390x")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=z196")
endif()
@@ -287,10 +264,21 @@ if(PORTABLE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=loongarch64")
endif()
endif()
+elseif(PORTABLE MATCHES [^0]+)
+ # Name of a CPU arch spec or feature set to require
+ if(MSVC)
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:${PORTABLE}")
+ else()
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=${PORTABLE}")
+ endif()
else()
if(MSVC)
+ # NOTE: No auto-detection of current CPU, but instead assume some useful
+ # level of optimization is supported
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX2")
else()
+ # Require instruction set from current CPU (with some legacy or opt-out
+ # exceptions)
if(CMAKE_SYSTEM_PROCESSOR MATCHES "^s390x" AND NOT HAS_S390X_MARCH_NATIVE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=z196")
elseif(NOT CMAKE_SYSTEM_PROCESSOR MATCHES "^(powerpc|ppc)64" AND NOT HAS_ARMV8_CRC)
@@ -305,25 +293,6 @@ if(NOT MSVC)
set(CMAKE_REQUIRED_FLAGS "-msse4.2 -mpclmul")
endif()
-CHECK_CXX_SOURCE_COMPILES("
-#include <cstdint>
-#include <nmmintrin.h>
-#include <wmmintrin.h>
-int main() {
- volatile uint32_t x = _mm_crc32_u32(0, 0);
- const auto a = _mm_set_epi64x(0, 0);
- const auto b = _mm_set_epi64x(0, 0);
- const auto c = _mm_clmulepi64_si128(a, b, 0x00);
- auto d = _mm_cvtsi128_si64(c);
-}
-" HAVE_SSE42)
-if(HAVE_SSE42)
- add_definitions(-DHAVE_SSE42)
- add_definitions(-DHAVE_PCLMUL)
-elseif(FORCE_SSE42)
- message(FATAL_ERROR "FORCE_SSE42=ON but unable to compile with SSE4.2 enabled")
-endif()
-
# Check if -latomic is required or not
if (NOT MSVC)
set(CMAKE_REQUIRED_FLAGS "--std=c++17")
@@ -1010,12 +979,6 @@ if ( ROCKSDB_PLUGINS )
endforeach()
endif()
-if(HAVE_SSE42 AND NOT MSVC)
- set_source_files_properties(
- util/crc32c.cc
- PROPERTIES COMPILE_FLAGS "-msse4.2 -mpclmul")
-endif()
-
if(CMAKE_SYSTEM_PROCESSOR MATCHES "^(powerpc|ppc)64")
list(APPEND SOURCES
util/crc32c_ppc.c