1
0
Fork 0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-03-29 07:09:54 +01:00
Commit Graph

19 Commits

Author SHA1 Message Date
Jack O'Connor 037de38bfe upgrade to arrayvec 0.7.0
This version uses const generics, which bumps our minimum supported
compiler version to 1.51.
2021-05-18 12:28:29 -04:00
Jack O'Connor 4b7babbe99 more cleaup of undocumented API 2021-03-28 20:04:51 -04:00
Jack O'Connor 05292a018b get rid of the standalone "*_rayon" functions
These clutter the toplevel API, and their prominence might lead callers
to prefer them as a first resort, which probably isn't a good idea.
Restricting multithreading to `Hasher::update_rayon` feels better,
similar to what we've done with `Hasher::finalize_xof`. (But I think
`update_rayon` is still an improvement over the trait-based interface
that it replaced.)
2021-03-21 21:14:13 -04:00
Jack O'Connor b228f46e03 add *_rayon methods 2021-03-14 00:26:18 -05:00
Matthew Krupcale d91f20dd29 Start SSE2 implementation based on SSE4.1 version
Wire up basic functions and features for SSE2 support using the SSE4.1 version
as a basis without implementing the SSE2 instructions yet.

 * Cargo.toml: add no_sse2 feature
 * benches/bench.rs: wire SSE2 benchmarks
 * build.rs: add SSE2 rust intrinsics and assembly builds
 * c/Makefile.testing: add SSE2 C and assembly targets
 * c/README.md: add SSE2 to C build instructions
 * c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds
 * c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings
 * c/blake3_dispatch.c: add SSE2 C dispatch
 * c/blake3_impl.h: add SSE2 C function prototypes
 * c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version
 * c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2
   assembly files starting with SSE4.1 version
 * src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings
 * src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module
   configurations
 * src/platform.rs: add SSE2 rust platform detection and dispatch
 * src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version
 * tools/instruction_set_support/src/main.rs: add SSE2 feature detection
2020-08-24 00:54:46 -04:00
Jack O'Connor b8cdcb1f84 automatically fall back to the pure Rust build
There are two scenarios where compiling AVX-512 C or assembly code might
not work:

1. There might not be a C compiler installed at all. Most commonly this
   is either in cross-compiling situations, or with the Windows GNU
   target.
2. The installed C compiler might not support e.g. -mavx512f, because
   it's too old.

In both of these cases, print a relevant warning, and then automatically
fall back to using the pure Rust intrinsics build.

Note that this only affects x86 targets. Other targets always use pure
Rust, unless the "neon" feature is enabled.
2020-04-01 19:13:15 -04:00
Jack O'Connor e06a0f255a refactor the Cargo feature set
The biggest change here is that assembly implementations are enabled by
default.

Added features:
- "pure" (Pure Rust, with no C or assembly implementations.)

Removed features:
- "c" (Now basically the default.)

Renamed features;
- "c_prefer_intrinsics" -> "prefer_intrinsics"
- "c_neon" -> "neon"

Unchanged:
- "rayon"
- "std" (Still the only feature on by default.)
2020-03-29 18:02:03 -04:00
Jack O'Connor 8d84cfc0af remove a mis-optimization that hurt performance for uneven updates
If the total number of chunks hashed so far is e.g. 1, and update() is
called with e.g. 8 more chunks, we can't compress all 8 together. We
have to break the input up, to make sure that that 1 lone chunk CV gets
merged with its proper sibling, and that in general the correct layout
of the tree is preserved. What we should do is hash 1-2-4-1 chunks of
input, using increasing powers of 2 (with some cleanup at the end). What
we were doing was 2-2-2-2 chunks. This was the result of a mistaken
optimization that got us stuck with an always-odd number of chunks so
far.

Fixes https://github.com/BLAKE3-team/BLAKE3/issues/69.
2020-02-25 11:40:37 -05:00
Jack O'Connor efbfa0463c integrate assembly implementations into the blake3 crate 2020-02-12 10:23:17 -05:00
Jack O'Connor fc219f4f8d Hasher::update_with_join
This is a new interface that allows the caller to provide a
multi-threading implementation. It's defined in terms of a new `Join`
trait, for which we provide two implementations, `SerialJoin` and
`RayonJoin`. This lets the caller control when multi-threading is used,
rather than the previous all-or-nothing design of the "rayon" feature.

Although existing callers should keep working, this is a compatibility
break, because callers who were relying on automatic multi-threading
before will now be single-threaded. Thus the next release of this crate
will need to be version 0.2.

See https://github.com/BLAKE3-team/BLAKE3/issues/25 and
https://github.com/BLAKE3-team/BLAKE3/issues/54.
2020-02-06 15:07:15 -05:00
Jack O'Connor 1384edd67c rename 1_chunk benchmarks to 1_kib 2019-12-13 10:06:46 -05:00
Jack O'Connor fb0682c4c5 add 2 KiB benchmarks 2019-12-13 09:23:21 -05:00
Jack O'Connor fe9b8fedd7 fix benchmarks build 2019-12-12 23:31:02 -05:00
Jack O'Connor 52ea6487f8 switch to representing CVs as words for the compression function
The portable implementation was getting slowed down by converting back
and forth between words and bytes.

I made the corresponding change on the C side first
(12a37be8b5),
and as part of this commit I'm re-vendoring the C code. I'm also
exposing a small FFI interface to C so that blake3_neon.c can link
against portable.rs rather than blake3_portable.c, see c_neon.rs.
2019-12-11 18:05:26 -05:00
Jack O'Connor ae7271cc87 add benchmarks for AVX-512 and NEON 2019-12-08 21:56:10 -05:00
Jack O'Connor bcb99ba087 fix the benchmarks build 2019-12-07 22:02:19 -05:00
Jack O'Connor d6fbb03d01 add reference impl benchmarks 2019-12-07 21:46:56 -05:00
Jack O'Connor 4b2d856754 add many_parents benchmarks 2019-12-06 17:18:39 -05:00
Jack O'Connor 19471453f5 add bench.rs 2019-12-06 16:17:30 -05:00