1
0
Fork 0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-05-18 16:26:07 +02:00
Commit Graph

603 Commits

Author SHA1 Message Date
Samuel Neves bf705f2d54 remove avoidable spill 2020-08-31 19:11:58 +01:00
Samuel Neves 3340e32c7f
Merge pull request #110 from mkrupcale/sse2
Add SSE2 implementations
2020-08-31 18:56:55 +01:00
Matthew Krupcale be2da69b6b C: asm: simplify pblendw emulation
Use statically calculated ~mask. This reduces the number of moves and registers necessary at the expense of an extra memory load. This is probably a good trade-off since we are not bound by memory uops in this loop.
2020-08-31 12:12:42 -04:00
Nikolai Vazquez 324090b2c3 Implement `fmt::Debug` using builders
This enables pretty printing via `{:#?}`. The normal style for `{:?}` is
kept exactly the same.
2020-08-31 12:04:40 -04:00
Matthew Krupcale 47e415c7f1 C: asm: simplify pinsrd emulation
Use punpckl{,q}dq instead of pinsrw.
2020-08-31 00:21:47 -04:00
Matthew Krupcale c592e9a3f6 C: asm: remove blendvps usage altogether
This simplifies the operation by removing the need to use blendvps at all.
2020-08-30 23:13:47 -04:00
Renzo Carbonara 31e4080aa2 C: Add blake3_hasher_init_derive_key_len
blake3_hasher_init_derive_key_len is an alternative version of
blake3_hasher_init_derive_key which takes the context and its
length as separate parameters, and not together as a C string.

The motivation for this addition is making it easier for
bindings to this C library to call this function without
having to first copy over the context bytes just to add
one 0x00 byte at the end.

Notice that contrary to blake3_hasher_init_derive_key,
blake3_hasher_init_derive_key_len allows the inclusion of a
0x00 byte in the context. Given the rules about context string
selection, this byte is unlikely to be used as part of a context
string. But if for some reason it is ever given, it will be
included in the context string and processed like any other
non-alphanumeric byte would. For compatibility with
blake3_hasher_init_derive_key, bindings should still check for
the absence of 0x00 bytes.
2020-08-30 12:27:33 +03:00
Jack O'Connor c8a5b53e1d wording tweak in the C readme 2020-08-26 16:55:39 -04:00
Matthew Krupcale c33a8462d1 Write _mm_blend_epi16 emulation without multiplication
Use _mm_and_si128 and _mm_cmpeq_epi16 rather than expensive multiplication _mm_mullo_epi16 with _mm_srai_epi16 that compiler may not be able to optimize.
2020-08-25 12:26:15 -04:00
Matthew Krupcale 90e2a924a4 Fix Windows MSVC undefined symbol errors
MSVC returns "error A2006:undefined symbol : FFFFFFFFH", so use 0FFFFFFFFH instead. Also use 0 prefix for 0H to align things.
2020-08-24 21:31:29 -04:00
Matthew Krupcale e581035bd3 Put PBLENDW masks in the RDATA section
Previously, these masks were undefined because they were outside of the RDATA section.
2020-08-24 21:26:41 -04:00
Matthew Krupcale 00849f8625 Fix Windows MSVC undefined symbol errors
MSVC returns "error A2006:undefined symbol : B1H", so use 0B1H instead.
2020-08-24 21:20:10 -04:00
Matthew Krupcale c32660099a Fix unreachable expression compiler warning
SSE2 target_feature appears to always be present for x86_64.
2020-08-24 21:09:56 -04:00
Matthew Krupcale e4681ec39e C: asm: emulate pshufb ROT8 using SSE2 instructions
Use a simple shift for the rotation.

 * c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale 769c7cdc96 C: asm: emulate pshufb ROT16 using SSE2 instructions
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.

 * c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale 1ef915dbea C: asm: emulate pinsrd using SSE2 instructions
Use two pinsrw and a 16-bit shift to insert the 32-bit integer at the desired location.

 * c/blake3_sse2_x86-64_unix.S: emulate pinsrd using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale e632967a8d C: asm: emulate blendvps using SSE2 instructions
Blend according to (mask & b) | ((~mask) & a).

 * c/blake3_sse2_x86-64_unix.S: emulate blendvps using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:28 -04:00
Matthew Krupcale 460c9d3031 C: asm: emulate pblendw using SSE2 instructions
Use a constant mask to blend according to (mask & b) | ((~mask) & a).

 * c/blake3_sse2_x86-64_unix.S: emulate pblendw using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:09 -04:00
Matthew Krupcale a9a701c622 SSE2 intrinsic: emulate _mm_shuffle_epi8 SSSE3 intrinsic rot8 with SSE2 intrinsics
Use a simple shift version for the 8-bit rotation.

 * c/blake3_sse2.c: emulate _mm_shuffle_epi8 rot8 using SSE2 intrinsics
2020-08-24 00:56:57 -04:00
Matthew Krupcale 92c8047a15 SSE2 intrinsic: emulate _mm_shuffle_epi8 SSSE3 intrinsic rot16 with SSE2 intrinsics
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.

 * c/blake3_sse2.c: emulate _mm_shuffle_epi8 rot16 using SSE2 intrinsics
2020-08-24 00:56:46 -04:00
Matthew Krupcale 40a4a2b6b0 SSE2 intrinsic: emulate _mm_blend_epi16 SSE4.1 intrinsic with SSE2 intrinsics
Use a constant mask to blend according to (mask & b) | ((~mask) & a).

 * src/rust_sse2.rs: emulate _mm_blend_epi16 using SSE2 intrinsics
 * c/blake3_sse2.c: Likewise.
2020-08-24 00:55:06 -04:00
Matthew Krupcale d91f20dd29 Start SSE2 implementation based on SSE4.1 version
Wire up basic functions and features for SSE2 support using the SSE4.1 version
as a basis without implementing the SSE2 instructions yet.

 * Cargo.toml: add no_sse2 feature
 * benches/bench.rs: wire SSE2 benchmarks
 * build.rs: add SSE2 rust intrinsics and assembly builds
 * c/Makefile.testing: add SSE2 C and assembly targets
 * c/README.md: add SSE2 to C build instructions
 * c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds
 * c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings
 * c/blake3_dispatch.c: add SSE2 C dispatch
 * c/blake3_impl.h: add SSE2 C function prototypes
 * c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version
 * c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2
   assembly files starting with SSE4.1 version
 * src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings
 * src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module
   configurations
 * src/platform.rs: add SSE2 rust platform detection and dispatch
 * src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version
 * tools/instruction_set_support/src/main.rs: add SSE2 feature detection
2020-08-24 00:54:46 -04:00
Samuel Neves adbf07d67a Fix #109
The default executable stack setting on Linux can be fixed in two different ways:

 - By adding the `.section .note.GNU-stack,"",%progbits` special incantation
 - By passing the `--noexecstack` flag to the assembler

This patch implements both, but only one of them is strictly necessary.

I've also added some additional hardening flags to the Makefile. May not be portable.
2020-08-23 22:32:36 +01:00
Jack O'Connor 8dc30a2737 assembly authorship in the README 2020-08-19 10:52:32 -04:00
Jack O'Connor 09cc03614d the same hex example for rustdocs 2020-08-14 11:33:53 -04:00
Jack O'Connor 88094169fc tweak the readme hex example 2020-08-14 11:30:46 -04:00
Alexx Roche 7589c8c608 How to access the hex inside the Hash()
blake3::hash(b"str") -> Hash(hex)  and I want the `hex` stripped out from the Hash() as a str
2020-08-14 11:22:56 -04:00
Jack O'Connor 63d27d4d1e version 0.3.6
Changes since 0.3.5:
- Fix a build break in the assembly files under older versions of GCC.
2020-07-29 19:21:23 -04:00
Samuel Neves b01784a057 support compilers without __has_include 2020-07-30 00:03:37 +01:00
Jack O'Connor e83cbbb8f5 clarify multithreading support in the C readme
Fixes https://github.com/BLAKE3-team/BLAKE3/issues/99.
2020-07-20 10:01:00 -04:00
Jack O'Connor e4703ac170 rename the C Makefile to Makefile.testing 2020-07-20 09:47:38 -04:00
Jack O'Connor 7d0de7be14 version 0.3.5
Changes since 0.3.4:
- The `digest` dependency is now v0.9 and the `crypto-mac` dependency is
  now v0.8.
- Intel CET is supported in the assembly implementations.
- `b3sum` error output includes filepaths again.
2020-07-10 12:21:12 -04:00
Jack O'Connor 2f6f56f347 stop being a jerk and add the context string to test_vectors.json 2020-06-29 16:38:53 -04:00
Samuel Neves f2005678f8
Merge pull request #96 from BLAKE3-team/cet
Assembly: enable CET
2020-06-27 18:04:02 +01:00
Samuel Neves a3ec6c1ccf enable CET on asm 2020-06-27 17:44:43 +01:00
Jack O'Connor c908847c3f shrink a stack array that's twice as big as it needs to be
It looks like I originally made this mistake when I was copying code
from the baokeshed prototype (a274a9b0fa),
and then it got replicated into the C implementation later.
2020-06-26 16:16:55 -04:00
Jack O'Connor e0f193ddc9 put the file name in b3sum error output
This was previously there, but got dropped in
c5c07bb337.
2020-06-24 18:02:16 -04:00
Jack O'Connor 4c41a893a0 a little bit of cleanup and more testing 2020-06-14 14:35:47 -04:00
Justus K 1ecb14ce34 Replace std::io::copy with clone_from_slice 2020-06-14 14:35:14 -04:00
Justus K 7eea9b4c75 Bump digest to 0.9.0 and crypto-mac to 0.8.0 2020-06-14 14:35:14 -04:00
Jack O'Connor e63ad97e8b link to prebuilt binaries from the b3sum README 2020-05-26 00:25:39 -04:00
Jack O'Connor f287b56bc6 all-capitalize "FILE" in the b3sum help output 2020-05-25 21:22:29 -04:00
Jack O'Connor 0215604c59 avoid repeating a string 2020-05-25 21:20:43 -04:00
Jack O'Connor ca9687e36c fix another small mistake in the docs 2020-05-23 15:16:02 -04:00
Jack O'Connor 0694d0f6a5 fix a typo in the docs 2020-05-23 15:13:19 -04:00
Jack O'Connor 8d6f0f2574 add a test comment 2020-05-23 14:56:43 -04:00
Jack O'Connor 7f154ceea3 version 0.3.4
Changes since 0.3.3:
- `b3sum` now supports the `--check` flag. This is intended to be a
  drop-in replacement for e.g. `md5sum --check` from Coreutils. The
  behavior is somewhat stricter than Coreutils with respect to invalid
  Unicode in filenames. For a complete description of how `--check`
  works, see the file `b3sum/what_does_check_do.md`.
- To support the `--check` feature, backslashes and newlines that appear
  in filenames are now escaped in the output of `b3sum`. This is done
  the same way as in Coreutils.
- To support `--check` interoperability between Unix and Windows,
  backslashes in filepaths on Windows are now replaced with forward
  slashes in the output of `b3sum`. Note that this is different from
  Coreutils.
2020-05-23 14:37:49 -04:00
Jack O'Connor cd093791ab remove an extra space in some help text 2020-05-23 14:28:10 -04:00
Jack O'Connor 48512ec4f0 use wild::args_os to support globbing on Windows 2020-05-23 12:46:45 -04:00
Jack O'Connor c9a1676942 add support for --quiet to `b3sum --check`
Suggested by @llowrey:
https://github.com/BLAKE3-team/BLAKE3/issues/33#issuecomment-629853747
2020-05-23 12:27:48 -04:00