New methods:
- update_reader
- update_mmap
- update_mmap_rayon
These are more discoverable, more convenient, and safer.
There are two problems I want to avoid by taking a `Path` instead of a
`File`. First, exposing `Mmap` objects to the caller is fundamentally
unsafe, and making `maybe_mmap_file` private avoids that issue. Second,
taking a `File` raises questions about whether memory mapped reads
should behave like regular file reads. (Should they respect the current
seek position? Should they update the seek position?) Taking a `Path`
from the caller and opening the `File` internally avoids these
questions.
I'm adding the i32::MAX test case here because I personally screwed it
up while I was working on
https://github.com/BLAKE3-team/BLAKE3/issues/271. The correct
implementation of the carry bit is the ANDNOT of old high bit (1) and
the new high bit (0). Using XOR instead of ANDNOT gives the correct
answer in the overflow case, but it also reports an incorrect "extra"
overflow when the high bit goes from 0 to 1.
The SSE2 patch introduced xmm10 as a temporary register for one of the
rotations, but xmm6-xmm15 are callee-save registers on Windows, and
SSE4.1 was only saving the registers it used. The minimal fix is to use
one of the saved registers instead of xmm10.
See https://github.com/BLAKE3-team/BLAKE3/issues/206.
These clutter the toplevel API, and their prominence might lead callers
to prefer them as a first resort, which probably isn't a good idea.
Restricting multithreading to `Hasher::update_rayon` feels better,
similar to what we've done with `Hasher::finalize_xof`. (But I think
`update_rayon` is still an improvement over the trait-based interface
that it replaced.)
We use a counter value that's very close to wrapping the lower word,
when we're testing the hash_many chunks case. It turns out that this is
a useful thing to do with parents too, even though parents 1) are
teeechnically supposed to always use a counter of 0, and 2) aren't going
to increment the counter at all. We caught a bug in the assembly
implementations this way (where we accidentally did increment the
counter, but only the higher word), because the equivalent test in
rust_c_bindings uses this eccentric parents counter value.
This is a new interface that allows the caller to provide a
multi-threading implementation. It's defined in terms of a new `Join`
trait, for which we provide two implementations, `SerialJoin` and
`RayonJoin`. This lets the caller control when multi-threading is used,
rather than the previous all-or-nothing design of the "rayon" feature.
Although existing callers should keep working, this is a compatibility
break, because callers who were relying on automatic multi-threading
before will now be single-threaded. Thus the next release of this crate
will need to be version 0.2.
See https://github.com/BLAKE3-team/BLAKE3/issues/25 and
https://github.com/BLAKE3-team/BLAKE3/issues/54.
One thing I like to test is that, if I hack simd_degree to be higher
than MAX_SIMD_DEGREE, assertions fire. This requires a test case long
enough to exceed that number of chunks.
The previous version of this API called for a key of exactly 256 bits.
That's good for optimal performance, but it would mean losing the
use-with-other-algorithms property for applications whose input keys are
a different size. There's no way for an abstraction over the previous
version to provide reliable domain separation for the "extract" step.
As part of this, get rid of the BLAKE3_FUZZ_ITERATIONS variable. I
wasn't using it anywhere, and it was leading to some compiler warnings
in --no-default-features mode.
The portable implementation was getting slowed down by converting back
and forth between words and bytes.
I made the corresponding change on the C side first
(12a37be8b5),
and as part of this commit I'm re-vendoring the C code. I'm also
exposing a small FFI interface to C so that blake3_neon.c can link
against portable.rs rather than blake3_portable.c, see c_neon.rs.