1
0
Fork 0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-05-12 18:56:26 +02:00
Commit Graph

340 Commits

Author SHA1 Message Date
Matthew Krupcale c33a8462d1 Write _mm_blend_epi16 emulation without multiplication
Use _mm_and_si128 and _mm_cmpeq_epi16 rather than expensive multiplication _mm_mullo_epi16 with _mm_srai_epi16 that compiler may not be able to optimize.
2020-08-25 12:26:15 -04:00
Matthew Krupcale 90e2a924a4 Fix Windows MSVC undefined symbol errors
MSVC returns "error A2006:undefined symbol : FFFFFFFFH", so use 0FFFFFFFFH instead. Also use 0 prefix for 0H to align things.
2020-08-24 21:31:29 -04:00
Matthew Krupcale e581035bd3 Put PBLENDW masks in the RDATA section
Previously, these masks were undefined because they were outside of the RDATA section.
2020-08-24 21:26:41 -04:00
Matthew Krupcale 00849f8625 Fix Windows MSVC undefined symbol errors
MSVC returns "error A2006:undefined symbol : B1H", so use 0B1H instead.
2020-08-24 21:20:10 -04:00
Matthew Krupcale c32660099a Fix unreachable expression compiler warning
SSE2 target_feature appears to always be present for x86_64.
2020-08-24 21:09:56 -04:00
Matthew Krupcale e4681ec39e C: asm: emulate pshufb ROT8 using SSE2 instructions
Use a simple shift for the rotation.

 * c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale 769c7cdc96 C: asm: emulate pshufb ROT16 using SSE2 instructions
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.

 * c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale 1ef915dbea C: asm: emulate pinsrd using SSE2 instructions
Use two pinsrw and a 16-bit shift to insert the 32-bit integer at the desired location.

 * c/blake3_sse2_x86-64_unix.S: emulate pinsrd using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale e632967a8d C: asm: emulate blendvps using SSE2 instructions
Blend according to (mask & b) | ((~mask) & a).

 * c/blake3_sse2_x86-64_unix.S: emulate blendvps using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:28 -04:00
Matthew Krupcale 460c9d3031 C: asm: emulate pblendw using SSE2 instructions
Use a constant mask to blend according to (mask & b) | ((~mask) & a).

 * c/blake3_sse2_x86-64_unix.S: emulate pblendw using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:09 -04:00
Matthew Krupcale a9a701c622 SSE2 intrinsic: emulate _mm_shuffle_epi8 SSSE3 intrinsic rot8 with SSE2 intrinsics
Use a simple shift version for the 8-bit rotation.

 * c/blake3_sse2.c: emulate _mm_shuffle_epi8 rot8 using SSE2 intrinsics
2020-08-24 00:56:57 -04:00
Matthew Krupcale 92c8047a15 SSE2 intrinsic: emulate _mm_shuffle_epi8 SSSE3 intrinsic rot16 with SSE2 intrinsics
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.

 * c/blake3_sse2.c: emulate _mm_shuffle_epi8 rot16 using SSE2 intrinsics
2020-08-24 00:56:46 -04:00
Matthew Krupcale 40a4a2b6b0 SSE2 intrinsic: emulate _mm_blend_epi16 SSE4.1 intrinsic with SSE2 intrinsics
Use a constant mask to blend according to (mask & b) | ((~mask) & a).

 * src/rust_sse2.rs: emulate _mm_blend_epi16 using SSE2 intrinsics
 * c/blake3_sse2.c: Likewise.
2020-08-24 00:55:06 -04:00
Matthew Krupcale d91f20dd29 Start SSE2 implementation based on SSE4.1 version
Wire up basic functions and features for SSE2 support using the SSE4.1 version
as a basis without implementing the SSE2 instructions yet.

 * Cargo.toml: add no_sse2 feature
 * benches/bench.rs: wire SSE2 benchmarks
 * build.rs: add SSE2 rust intrinsics and assembly builds
 * c/Makefile.testing: add SSE2 C and assembly targets
 * c/README.md: add SSE2 to C build instructions
 * c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds
 * c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings
 * c/blake3_dispatch.c: add SSE2 C dispatch
 * c/blake3_impl.h: add SSE2 C function prototypes
 * c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version
 * c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2
   assembly files starting with SSE4.1 version
 * src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings
 * src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module
   configurations
 * src/platform.rs: add SSE2 rust platform detection and dispatch
 * src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version
 * tools/instruction_set_support/src/main.rs: add SSE2 feature detection
2020-08-24 00:54:46 -04:00
Samuel Neves adbf07d67a Fix #109
The default executable stack setting on Linux can be fixed in two different ways:

 - By adding the `.section .note.GNU-stack,"",%progbits` special incantation
 - By passing the `--noexecstack` flag to the assembler

This patch implements both, but only one of them is strictly necessary.

I've also added some additional hardening flags to the Makefile. May not be portable.
2020-08-23 22:32:36 +01:00
Jack O'Connor 8dc30a2737 assembly authorship in the README 2020-08-19 10:52:32 -04:00
Jack O'Connor 09cc03614d the same hex example for rustdocs 2020-08-14 11:33:53 -04:00
Jack O'Connor 88094169fc tweak the readme hex example 2020-08-14 11:30:46 -04:00
Alexx Roche 7589c8c608 How to access the hex inside the Hash()
blake3::hash(b"str") -> Hash(hex)  and I want the `hex` stripped out from the Hash() as a str
2020-08-14 11:22:56 -04:00
Jack O'Connor 63d27d4d1e version 0.3.6
Changes since 0.3.5:
- Fix a build break in the assembly files under older versions of GCC.
2020-07-29 19:21:23 -04:00
Samuel Neves b01784a057 support compilers without __has_include 2020-07-30 00:03:37 +01:00
Jack O'Connor e83cbbb8f5 clarify multithreading support in the C readme
Fixes https://github.com/BLAKE3-team/BLAKE3/issues/99.
2020-07-20 10:01:00 -04:00
Jack O'Connor e4703ac170 rename the C Makefile to Makefile.testing 2020-07-20 09:47:38 -04:00
Jack O'Connor 7d0de7be14 version 0.3.5
Changes since 0.3.4:
- The `digest` dependency is now v0.9 and the `crypto-mac` dependency is
  now v0.8.
- Intel CET is supported in the assembly implementations.
- `b3sum` error output includes filepaths again.
2020-07-10 12:21:12 -04:00
Jack O'Connor 2f6f56f347 stop being a jerk and add the context string to test_vectors.json 2020-06-29 16:38:53 -04:00
Samuel Neves f2005678f8
Merge pull request #96 from BLAKE3-team/cet
Assembly: enable CET
2020-06-27 18:04:02 +01:00
Samuel Neves a3ec6c1ccf enable CET on asm 2020-06-27 17:44:43 +01:00
Jack O'Connor c908847c3f shrink a stack array that's twice as big as it needs to be
It looks like I originally made this mistake when I was copying code
from the baokeshed prototype (a274a9b0fa),
and then it got replicated into the C implementation later.
2020-06-26 16:16:55 -04:00
Jack O'Connor e0f193ddc9 put the file name in b3sum error output
This was previously there, but got dropped in
c5c07bb337.
2020-06-24 18:02:16 -04:00
Jack O'Connor 4c41a893a0 a little bit of cleanup and more testing 2020-06-14 14:35:47 -04:00
Justus K 1ecb14ce34 Replace std::io::copy with clone_from_slice 2020-06-14 14:35:14 -04:00
Justus K 7eea9b4c75 Bump digest to 0.9.0 and crypto-mac to 0.8.0 2020-06-14 14:35:14 -04:00
Jack O'Connor e63ad97e8b link to prebuilt binaries from the b3sum README 2020-05-26 00:25:39 -04:00
Jack O'Connor f287b56bc6 all-capitalize "FILE" in the b3sum help output 2020-05-25 21:22:29 -04:00
Jack O'Connor 0215604c59 avoid repeating a string 2020-05-25 21:20:43 -04:00
Jack O'Connor ca9687e36c fix another small mistake in the docs 2020-05-23 15:16:02 -04:00
Jack O'Connor 0694d0f6a5 fix a typo in the docs 2020-05-23 15:13:19 -04:00
Jack O'Connor 8d6f0f2574 add a test comment 2020-05-23 14:56:43 -04:00
Jack O'Connor 7f154ceea3 version 0.3.4
Changes since 0.3.3:
- `b3sum` now supports the `--check` flag. This is intended to be a
  drop-in replacement for e.g. `md5sum --check` from Coreutils. The
  behavior is somewhat stricter than Coreutils with respect to invalid
  Unicode in filenames. For a complete description of how `--check`
  works, see the file `b3sum/what_does_check_do.md`.
- To support the `--check` feature, backslashes and newlines that appear
  in filenames are now escaped in the output of `b3sum`. This is done
  the same way as in Coreutils.
- To support `--check` interoperability between Unix and Windows,
  backslashes in filepaths on Windows are now replaced with forward
  slashes in the output of `b3sum`. Note that this is different from
  Coreutils.
2020-05-23 14:37:49 -04:00
Jack O'Connor cd093791ab remove an extra space in some help text 2020-05-23 14:28:10 -04:00
Jack O'Connor 48512ec4f0 use wild::args_os to support globbing on Windows 2020-05-23 12:46:45 -04:00
Jack O'Connor c9a1676942 add support for --quiet to `b3sum --check`
Suggested by @llowrey:
https://github.com/BLAKE3-team/BLAKE3/issues/33#issuecomment-629853747
2020-05-23 12:27:48 -04:00
Jack O'Connor cd436251b6 some more clarifications in the --check docs 2020-05-16 13:29:10 -04:00
Jack O'Connor e1f3043e76 clarify the replacement character example 2020-05-15 16:11:11 -04:00
Jack O'Connor c71d88ce37 small typo 2020-05-15 16:05:18 -04:00
Jack O'Connor e8a868d6e5 finish the --check documentation 2020-05-15 15:58:29 -04:00
Jack O'Connor 82005455be link to the test vectors from the README 2020-05-15 13:19:59 -04:00
Jack O'Connor ae8cf2f924 start documenting the --check flag 2020-05-14 18:35:31 -04:00
Jack O'Connor 5651ce7ee0 enable clap default features
These are nice to have. I used to think this would increase build times,
but in practice it doesn't.
2020-05-14 11:32:05 -04:00
Jack O'Connor 86d5a13731 clarify that --no-mmap disables threading 2020-05-14 11:29:28 -04:00