Wire up basic functions and features for SSE2 support using the SSE4.1 version
as a basis without implementing the SSE2 instructions yet.
* Cargo.toml: add no_sse2 feature
* benches/bench.rs: wire SSE2 benchmarks
* build.rs: add SSE2 rust intrinsics and assembly builds
* c/Makefile.testing: add SSE2 C and assembly targets
* c/README.md: add SSE2 to C build instructions
* c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds
* c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings
* c/blake3_dispatch.c: add SSE2 C dispatch
* c/blake3_impl.h: add SSE2 C function prototypes
* c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version
* c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2
assembly files starting with SSE4.1 version
* src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings
* src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module
configurations
* src/platform.rs: add SSE2 rust platform detection and dispatch
* src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version
* tools/instruction_set_support/src/main.rs: add SSE2 feature detection
Two main changes:
- In https://github.com/BLAKE3-team/BLAKE3/issues/79 we learned that
older versions of Clang require AVX-512 flags even to compile AVX-512
assembly.
- If the C compiler doesn't support AVX-512, we'd still prefer to build
the SSE4.1 and AVX2 assembly implementations, on x86_64.
There are two scenarios where compiling AVX-512 C or assembly code might
not work:
1. There might not be a C compiler installed at all. Most commonly this
is either in cross-compiling situations, or with the Windows GNU
target.
2. The installed C compiler might not support e.g. -mavx512f, because
it's too old.
In both of these cases, print a relevant warning, and then automatically
fall back to using the pure Rust intrinsics build.
Note that this only affects x86 targets. Other targets always use pure
Rust, unless the "neon" feature is enabled.
The biggest change here is that assembly implementations are enabled by
default.
Added features:
- "pure" (Pure Rust, with no C or assembly implementations.)
Removed features:
- "c" (Now basically the default.)
Renamed features;
- "c_prefer_intrinsics" -> "prefer_intrinsics"
- "c_neon" -> "neon"
Unchanged:
- "rayon"
- "std" (Still the only feature on by default.)
This gives the assembly files the same prefix as the intrinsics files which
simplifies building when the build system should pick between the assembly and
the intrinsics files.
If AVX-512 is enabled, and the local C compiler doesn't support it, the
build is going to fail. However, if we check for this explicitly, we can
give a better error message.
Fixes https://github.com/BLAKE3-team/BLAKE3/issues/6.
The portable implementation was getting slowed down by converting back
and forth between words and bytes.
I made the corresponding change on the C side first
(12a37be8b5),
and as part of this commit I'm re-vendoring the C code. I'm also
exposing a small FFI interface to C so that blake3_neon.c can link
against portable.rs rather than blake3_portable.c, see c_neon.rs.