Changes since 0.3.6:
- BUGFIX: The C implementation was incorrect on big endian systems for
inputs longer than 1024 bytes. This bug affected all previous versions
of the C implementation. Little endian platforms like x86 were
unaffected. The Rust implementation was also unaffected.
@jakub-zwolakowski and @pascal-cuoq from TrustInSoft reported this
bug: https://github.com/BLAKE3-team/BLAKE3/pull/118
- BUGFIX: The C build on x86-64 was producing binaries with an
executable stack. @tristanheaven reported this bug:
https://github.com/BLAKE3-team/BLAKE3/issues/109
- @mkrupcale added optimized implementations for SSE2. This improves
performance on older x86 processors that don't support SSE4.1.
- The C implementation now exposes the
`blake3_hasher_init_derive_key_raw` function, to make it easier to
implement language bindings. Added by @k0001.
Wire up basic functions and features for SSE2 support using the SSE4.1 version
as a basis without implementing the SSE2 instructions yet.
* Cargo.toml: add no_sse2 feature
* benches/bench.rs: wire SSE2 benchmarks
* build.rs: add SSE2 rust intrinsics and assembly builds
* c/Makefile.testing: add SSE2 C and assembly targets
* c/README.md: add SSE2 to C build instructions
* c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds
* c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings
* c/blake3_dispatch.c: add SSE2 C dispatch
* c/blake3_impl.h: add SSE2 C function prototypes
* c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version
* c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2
assembly files starting with SSE4.1 version
* src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings
* src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module
configurations
* src/platform.rs: add SSE2 rust platform detection and dispatch
* src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version
* tools/instruction_set_support/src/main.rs: add SSE2 feature detection
Changes since 0.3.4:
- The `digest` dependency is now v0.9 and the `crypto-mac` dependency is
now v0.8.
- Intel CET is supported in the assembly implementations.
- `b3sum` error output includes filepaths again.
Changes since 0.3.3:
- `b3sum` now supports the `--check` flag. This is intended to be a
drop-in replacement for e.g. `md5sum --check` from Coreutils. The
behavior is somewhat stricter than Coreutils with respect to invalid
Unicode in filenames. For a complete description of how `--check`
works, see the file `b3sum/what_does_check_do.md`.
- To support the `--check` feature, backslashes and newlines that appear
in filenames are now escaped in the output of `b3sum`. This is done
the same way as in Coreutils.
- To support `--check` interoperability between Unix and Windows,
backslashes in filepaths on Windows are now replaced with forward
slashes in the output of `b3sum`. Note that this is different from
Coreutils.
I've tested manually and found that 1.0.4 is the oldest version of `cc`
that builds successfully for us. (Version 1.0.3 is missing the
`is_flag_supported` method.)
This change might help with
https://github.com/BLAKE3-team/BLAKE3/issues/83. That said, the
underlying issue there is related to "minimum supported Rust versions",
and `blake3` does not yet have an MSRV other than latest stable.
Changes since 0.3.0:
- The x86 build now automatically falls back to "pure" Rust intrinsics,
under either of two possible conditions:
1. The `cc` crate fails to invoke a C compiler at all, indicating that
nothing of the right name (e.g. "cc" or "$CC" on Unix) is installed.
2. The `cc` crate detects that the compiler doesn't support AVX-512
flags, usually because it's too old.
The end result should be that most callers successfully build the
assembly implementations, and that callers who can't build those see a
warning but not an error. (And note that Cargo suppresses warnings for
non-path depencies.)
There are two scenarios where compiling AVX-512 C or assembly code might
not work:
1. There might not be a C compiler installed at all. Most commonly this
is either in cross-compiling situations, or with the Windows GNU
target.
2. The installed C compiler might not support e.g. -mavx512f, because
it's too old.
In both of these cases, print a relevant warning, and then automatically
fall back to using the pure Rust intrinsics build.
Note that this only affects x86 targets. Other targets always use pure
Rust, unless the "neon" feature is enabled.
Changes since version 0.2.3:
- The optimized assembly implementations are now built by default. They
perform better than the intrinsics implementations, and they compile
much more quickly. Bringing the default behavior in line with reported
benchmark figures should also simplify things for people running their
own benchmarks. Previously this crate only built Rust intrinsics
implementations by default, and the assembly implementations were
gated by the (slightly confusingly named) "c" feature. Now the "c"
feature is gone, and applications that need the old behavior can use
the new "pure" feature. Mainly this will be applications that don't
want to require a C compiler. Note that the `b3sum` crate previously
activated the "c" feature by default, so its behavior hasn't changed.
The biggest change here is that assembly implementations are enabled by
default.
Added features:
- "pure" (Pure Rust, with no C or assembly implementations.)
Removed features:
- "c" (Now basically the default.)
Renamed features;
- "c_prefer_intrinsics" -> "prefer_intrinsics"
- "c_neon" -> "neon"
Unchanged:
- "rayon"
- "std" (Still the only feature on by default.)
Changes since version 0.2.2:
- Bug fix: Commit 13556be fixes a crash on Windows when using the SSE4.1
assembly implementation (--features=c, set by default for b3sum). This
is undefined behavior and therefore a potential security issue.
- b3sum now supports the --num-threads flag.
- The C API now includes a blake3_hasher_finalize_seek() function, which
returns output from any position in the extended output stream.
- Build fix: Commit 5fad419 fixes a compiler error in the AVX-512 C
intrinsics implementation targeting the Windows GNU ABI.
Changes since 0.2.1 (and since c-0.2.0):
- Fix a performance issue when the caller makes multiple calls to
update() with uneven lengths. (#69, reported by @willbryant.)
Changes since 0.1.5:
- The `c_avx512` feature has been replaced by the `c` feature. In
addition to providing AVX-512 support, `c` also provides optimized
assembly implementations. These assembly implementations perform
better, perform more consistently across compilers, and compile more
quickly. As before, `c` is off by default, but the `b3sum` binary
crate activates it by default.
- The `rayon` feature no longer affects the entire API. Instead, it
provides the `join::RayonJoin` type for use with
`Hasher::update_with_join`, so that the caller can control when
multi-threading happens. Standalone API functions like `hash` are
always single-threaded now.
This is a new interface that allows the caller to provide a
multi-threading implementation. It's defined in terms of a new `Join`
trait, for which we provide two implementations, `SerialJoin` and
`RayonJoin`. This lets the caller control when multi-threading is used,
rather than the previous all-or-nothing design of the "rayon" feature.
Although existing callers should keep working, this is a compatibility
break, because callers who were relying on automatic multi-threading
before will now be single-threaded. Thus the next release of this crate
will need to be version 0.2.
See https://github.com/BLAKE3-team/BLAKE3/issues/25 and
https://github.com/BLAKE3-team/BLAKE3/issues/54.
Changes since 0.1.3:
- Hasher supports the reset() method.
- Hasher implements several traits from the `digest` and `crypto_mac`
crates.
- Bug fixes in the C implementation for MSVC and for 32-bit x86.
Changes since 0.1.2:
- All x86 implementations include _mm_prefetch optimizations. These
improve performance for very large inputs.
- The C implementation performs parallel parent hashing, matching the
performance of the single-threaded Rust implementation.
- b3sum supports --no-mmap. Contributed by @cesarb.
Changes since 0.1.1:
- b3sum no longer mmaps files smaller than 16 KiB. This improves
performance for hashing many small files. Contributed by @xzfc.
- b3sum now supports --raw output. Contributed by @phayes.
Changes since 0.1.0:
- Optimizations contributed by @cesarb.
- Fix the build on x86_64-pc-windows-gnu when c_avx512 is enabled.
- Add an explicit error message for compilers that don't support c_avx512.
The generic constant_time_eq has several branches on the slice length,
which are not necessary when the slice length is known. However, the
optimizer is not allowed to look into the core of constant_time_eq, so
these branches cannot be elided.
Use instead a fixed-size variant of constant_time_eq, which has no
branches since the length is known.