1
0
Fork 0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-06-09 09:06:04 +02:00
Commit Graph

6 Commits

Author SHA1 Message Date
Matthew Krupcale e4681ec39e C: asm: emulate pshufb ROT8 using SSE2 instructions
Use a simple shift for the rotation.

 * c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale 769c7cdc96 C: asm: emulate pshufb ROT16 using SSE2 instructions
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.

 * c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale 1ef915dbea C: asm: emulate pinsrd using SSE2 instructions
Use two pinsrw and a 16-bit shift to insert the 32-bit integer at the desired location.

 * c/blake3_sse2_x86-64_unix.S: emulate pinsrd using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:39 -04:00
Matthew Krupcale e632967a8d C: asm: emulate blendvps using SSE2 instructions
Blend according to (mask & b) | ((~mask) & a).

 * c/blake3_sse2_x86-64_unix.S: emulate blendvps using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:28 -04:00
Matthew Krupcale 460c9d3031 C: asm: emulate pblendw using SSE2 instructions
Use a constant mask to blend according to (mask & b) | ((~mask) & a).

 * c/blake3_sse2_x86-64_unix.S: emulate pblendw using SSE2 instructions for x86_64 unix
 * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
 * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24 00:57:09 -04:00
Matthew Krupcale d91f20dd29 Start SSE2 implementation based on SSE4.1 version
Wire up basic functions and features for SSE2 support using the SSE4.1 version
as a basis without implementing the SSE2 instructions yet.

 * Cargo.toml: add no_sse2 feature
 * benches/bench.rs: wire SSE2 benchmarks
 * build.rs: add SSE2 rust intrinsics and assembly builds
 * c/Makefile.testing: add SSE2 C and assembly targets
 * c/README.md: add SSE2 to C build instructions
 * c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds
 * c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings
 * c/blake3_dispatch.c: add SSE2 C dispatch
 * c/blake3_impl.h: add SSE2 C function prototypes
 * c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version
 * c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2
   assembly files starting with SSE4.1 version
 * src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings
 * src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module
   configurations
 * src/platform.rs: add SSE2 rust platform detection and dispatch
 * src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version
 * tools/instruction_set_support/src/main.rs: add SSE2 feature detection
2020-08-24 00:54:46 -04:00