1
0
Fork 0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-05-23 12:56:07 +02:00
Commit Graph

87 Commits

Author SHA1 Message Date
Jack O'Connor 27b7f610e0
Merge pull request #114 from k0001/no-cstr
C: Add blake3_hasher_init_derive_key_len
2020-09-10 14:54:15 -05:00
Renzo Carbonara b205e0efa1 C: rename blake3_hasher_init_derive_key_raw and documentation 2020-09-01 13:20:16 +03:00
Samuel Neves 8610ebda6a add sse2 tests and benchmarks 2020-08-31 19:12:01 +01:00
Samuel Neves bf705f2d54 remove avoidable spill 2020-08-31 19:11:58 +01:00
Samuel Neves 3340e32c7f
Merge pull request #110 from mkrupcale/sse2
Add SSE2 implementations
2020-08-31 18:56:55 +01:00
Matthew Krupcale be2da69b6b C: asm: simplify pblendw emulation
Use statically calculated ~mask. This reduces the number of moves and registers necessary at the expense of an extra memory load. This is probably a good trade-off since we are not bound by memory uops in this loop.
2020-08-31 12:12:42 -04:00
Matthew Krupcale 47e415c7f1 C: asm: simplify pinsrd emulation
Use punpckl{,q}dq instead of pinsrw.
2020-08-31 00:21:47 -04:00
Matthew Krupcale c592e9a3f6 C: asm: remove blendvps usage altogether
This simplifies the operation by removing the need to use blendvps at all.
2020-08-30 23:13:47 -04:00
Renzo Carbonara 31e4080aa2 C: Add blake3_hasher_init_derive_key_len
blake3_hasher_init_derive_key_len is an alternative version of
blake3_hasher_init_derive_key which takes the context and its
length as separate parameters, and not together as a C string.

The motivation for this addition is making it easier for
bindings to this C library to call this function without
having to first copy over the context bytes just to add
one 0x00 byte at the end.

Notice that contrary to blake3_hasher_init_derive_key,
blake3_hasher_init_derive_key_len allows the inclusion of a
0x00 byte in the context. Given the rules about context string
selection, this byte is unlikely to be used as part of a context
string. But if for some reason it is ever given, it will be
included in the context string and processed like any other
non-alphanumeric byte would. For compatibility with
blake3_hasher_init_derive_key, bindings should still check for
the absence of 0x00 bytes.
2020-08-30 12:27:33 +03:00
Jack O'Connor c8a5b53e1d wording tweak in the C readme 2020-08-26 16:55:39 -04:00
Matthew Krupcale c33a8462d1 Write _mm_blend_epi16 emulation without multiplication
Use _mm_and_si128 and _mm_cmpeq_epi16 rather than expensive multiplication _mm_mullo_epi16 with _mm_srai_epi16 that compiler may not be able to optimize.
2020-08-25 12:26:15 -04:00
Matthew Krupcale 90e2a924a4 Fix Windows MSVC undefined symbol errors
MSVC returns "error A2006:undefined symbol : FFFFFFFFH", so use 0FFFFFFFFH instead. Also use 0 prefix for 0H to align things.
2020-08-24 21:31:29 -04:00
Matthew Krupcale e581035bd3 Put PBLENDW masks in the RDATA section
Previously, these masks were undefined because they were outside of the RDATA section.
2020-08-24 21:26:41 -04:00
Matthew Krupcale 00849f8625 Fix Windows MSVC undefined symbol errors
MSVC returns "error A2006:undefined symbol : B1H", so use 0B1H instead.
2020-08-24 21:20:10 -04:00
Matthew Krupcale e4681ec39e C: asm: emulate pshufb ROT8 using SSE2 instructions
Use a simple shift for the rotation.

 * c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale 769c7cdc96 C: asm: emulate pshufb ROT16 using SSE2 instructions
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.

 * c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale 1ef915dbea C: asm: emulate pinsrd using SSE2 instructions
Use two pinsrw and a 16-bit shift to insert the 32-bit integer at the desired location.

 * c/blake3_sse2_x86-64_unix.S: emulate pinsrd using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale e632967a8d C: asm: emulate blendvps using SSE2 instructions
Blend according to (mask & b) | ((~mask) & a).

 * c/blake3_sse2_x86-64_unix.S: emulate blendvps using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:28 -04:00
Matthew Krupcale 460c9d3031 C: asm: emulate pblendw using SSE2 instructions
Use a constant mask to blend according to (mask & b) | ((~mask) & a).

 * c/blake3_sse2_x86-64_unix.S: emulate pblendw using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:09 -04:00
Matthew Krupcale a9a701c622 SSE2 intrinsic: emulate _mm_shuffle_epi8 SSSE3 intrinsic rot8 with SSE2 intrinsics
Use a simple shift version for the 8-bit rotation.

 * c/blake3_sse2.c: emulate _mm_shuffle_epi8 rot8 using SSE2 intrinsics
2020-08-24 00:56:57 -04:00
Matthew Krupcale 92c8047a15 SSE2 intrinsic: emulate _mm_shuffle_epi8 SSSE3 intrinsic rot16 with SSE2 intrinsics
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.

 * c/blake3_sse2.c: emulate _mm_shuffle_epi8 rot16 using SSE2 intrinsics
2020-08-24 00:56:46 -04:00
Matthew Krupcale 40a4a2b6b0 SSE2 intrinsic: emulate _mm_blend_epi16 SSE4.1 intrinsic with SSE2 intrinsics
Use a constant mask to blend according to (mask & b) | ((~mask) & a).

 * src/rust_sse2.rs: emulate _mm_blend_epi16 using SSE2 intrinsics
 * c/blake3_sse2.c: Likewise.
2020-08-24 00:55:06 -04:00
Matthew Krupcale d91f20dd29 Start SSE2 implementation based on SSE4.1 version
Wire up basic functions and features for SSE2 support using the SSE4.1 version
as a basis without implementing the SSE2 instructions yet.

 * Cargo.toml: add no_sse2 feature
 * benches/bench.rs: wire SSE2 benchmarks
 * build.rs: add SSE2 rust intrinsics and assembly builds
 * c/Makefile.testing: add SSE2 C and assembly targets
 * c/README.md: add SSE2 to C build instructions
 * c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds
 * c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings
 * c/blake3_dispatch.c: add SSE2 C dispatch
 * c/blake3_impl.h: add SSE2 C function prototypes
 * c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version
 * c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2
   assembly files starting with SSE4.1 version
 * src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings
 * src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module
   configurations
 * src/platform.rs: add SSE2 rust platform detection and dispatch
 * src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version
 * tools/instruction_set_support/src/main.rs: add SSE2 feature detection
2020-08-24 00:54:46 -04:00
Samuel Neves adbf07d67a Fix #109
The default executable stack setting on Linux can be fixed in two different ways:

 - By adding the `.section .note.GNU-stack,"",%progbits` special incantation
 - By passing the `--noexecstack` flag to the assembler

This patch implements both, but only one of them is strictly necessary.

I've also added some additional hardening flags to the Makefile. May not be portable.
2020-08-23 22:32:36 +01:00
Samuel Neves b01784a057 support compilers without __has_include 2020-07-30 00:03:37 +01:00
Jack O'Connor e83cbbb8f5 clarify multithreading support in the C readme
Fixes https://github.com/BLAKE3-team/BLAKE3/issues/99.
2020-07-20 10:01:00 -04:00
Jack O'Connor e4703ac170 rename the C Makefile to Makefile.testing 2020-07-20 09:47:38 -04:00
Jack O'Connor 2f6f56f347 stop being a jerk and add the context string to test_vectors.json 2020-06-29 16:38:53 -04:00
Samuel Neves f2005678f8
Merge pull request #96 from BLAKE3-team/cet
Assembly: enable CET
2020-06-27 18:04:02 +01:00
Samuel Neves a3ec6c1ccf enable CET on asm 2020-06-27 17:44:43 +01:00
Jack O'Connor c908847c3f shrink a stack array that's twice as big as it needs to be
It looks like I originally made this mistake when I was copying code
from the baokeshed prototype (a274a9b0fa),
and then it got replicated into the C implementation later.
2020-06-26 16:16:55 -04:00
Samuel Neves 7ef795d62e Do not require AVX512DQ
Whereas vinserti64x4 is present on AVX512F, vinserti32x8 requires
AVX512DQ, which we do not test for. At this point there is not
much risk of incompatibility, since Skylake-X chips have all the
requires instruction sets, but let's be precise about this.
2020-04-12 11:38:11 +01:00
Samuel Neves eec458d03e move prototypes to shared header file, and make all local functions static. 2020-03-31 21:21:08 +01:00
Jack O'Connor e06a0f255a refactor the Cargo feature set
The biggest change here is that assembly implementations are enabled by
default.

Added features:
- "pure" (Pure Rust, with no C or assembly implementations.)

Removed features:
- "c" (Now basically the default.)

Renamed features;
- "c_prefer_intrinsics" -> "prefer_intrinsics"
- "c_neon" -> "neon"

Unchanged:
- "rayon"
- "std" (Still the only feature on by default.)
2020-03-29 18:02:03 -04:00
Samuel Neves 13556be388 save missing clobbered registers on Windows 2020-03-29 05:53:37 +01:00
Jack O'Connor c26a37f70c C files -> C and assembly files 2020-03-25 17:25:48 -04:00
Jack O'Connor c3639b4255 c/README.md changes
The C implementation now supports output seeking. Also expand the API
section a bit, and reorganize things to put the example on top.
2020-03-25 17:11:36 -04:00
Jack O'Connor a4ceef3932 add blake3_hasher_finalize_seek to the C API 2020-03-25 17:11:36 -04:00
Jack O'Connor 9d77bd6958 correct a comment 2020-03-17 14:26:39 -04:00
Jack O'Connor 48f2f745d9 clean up the C example a bit 2020-03-01 17:33:36 -05:00
Jack O'Connor 0432f9c7a3 some comment typos 2020-02-27 09:52:46 -05:00
Jack O'Connor 8d84cfc0af remove a mis-optimization that hurt performance for uneven updates
If the total number of chunks hashed so far is e.g. 1, and update() is
called with e.g. 8 more chunks, we can't compress all 8 together. We
have to break the input up, to make sure that that 1 lone chunk CV gets
merged with its proper sibling, and that in general the correct layout
of the tree is preserved. What we should do is hash 1-2-4-1 chunks of
input, using increasing powers of 2 (with some cleanup at the end). What
we were doing was 2-2-2-2 chunks. This was the result of a mistaken
optimization that got us stuck with an always-odd number of chunks so
far.

Fixes https://github.com/BLAKE3-team/BLAKE3/issues/69.
2020-02-25 11:40:37 -05:00
Samuel Neves 421a21abd8 Fix bug inadvertently introduced in a1c4c4efb5 2020-02-13 16:08:07 +00:00
Samuel Neves 207915a751 Work around GCC bug 85328 by forcing trivially masked stores.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85328

Fixes #58.
2020-02-13 15:22:17 +00:00
Samuel Neves fa6f14cafa Work around clang bug 36144 by replacing anonymous label numbers.
https://bugs.llvm.org/show_bug.cgi?id=36144

Fixes #60.
2020-02-13 15:22:17 +00:00
Jack O'Connor fcc14c8c1b more file renaming, use underscores more consistently 2020-02-12 18:41:41 -05:00
Erik Johansson 0281f1ae16 Rename assembly files (blake3-* -> blake3_*)
This gives the assembly files the same prefix as the intrinsics files which
simplifies building when the build system should pick between the assembly and
the intrinsics files.
2020-02-12 23:08:44 +01:00
Jack O'Connor 1c4d7fdd8d add test_asm to the C Makefile 2020-02-12 13:12:05 -05:00
Jack O'Connor 7ee05ba3bd document how to build the C code with assembly implementations 2020-02-12 13:04:03 -05:00
Jack O'Connor b8a1d2d982 integrate assembly implementations into blake3_c_rust_bindings 2020-02-12 10:23:17 -05:00