The biggest change here is that assembly implementations are enabled by
default.
Added features:
- "pure" (Pure Rust, with no C or assembly implementations.)
Removed features:
- "c" (Now basically the default.)
Renamed features;
- "c_prefer_intrinsics" -> "prefer_intrinsics"
- "c_neon" -> "neon"
Unchanged:
- "rayon"
- "std" (Still the only feature on by default.)
One thing I like to test is that, if I hack simd_degree to be higher
than MAX_SIMD_DEGREE, assertions fire. This requires a test case long
enough to exceed that number of chunks.
I must've written the comment one way, and then changed the context
string later, without realizing that I'd copied it. Apologies to
everyone whose time I wasted with this.
Fixes https://github.com/BLAKE3-team/BLAKE3/issues/15.
The previous version of this API called for a key of exactly 256 bits.
That's good for optimal performance, but it would mean losing the
use-with-other-algorithms property for applications whose input keys are
a different size. There's no way for an abstraction over the previous
version to provide reliable domain separation for the "extract" step.
Smaller chunk sizes are a big benefit for parallelism at shorter input
lengths, and recent benchmarks show that this reduction has a relative
small cost in terms of peak throughput. It's also a nice round number.