1
0
Fork 0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-05-30 03:16:23 +02:00
Commit Graph

62 Commits

Author SHA1 Message Date
Jack O'Connor e83cbbb8f5 clarify multithreading support in the C readme
Fixes https://github.com/BLAKE3-team/BLAKE3/issues/99.
2020-07-20 10:01:00 -04:00
Jack O'Connor e4703ac170 rename the C Makefile to Makefile.testing 2020-07-20 09:47:38 -04:00
Jack O'Connor 2f6f56f347 stop being a jerk and add the context string to test_vectors.json 2020-06-29 16:38:53 -04:00
Samuel Neves f2005678f8
Merge pull request #96 from BLAKE3-team/cet
Assembly: enable CET
2020-06-27 18:04:02 +01:00
Samuel Neves a3ec6c1ccf enable CET on asm 2020-06-27 17:44:43 +01:00
Jack O'Connor c908847c3f shrink a stack array that's twice as big as it needs to be
It looks like I originally made this mistake when I was copying code
from the baokeshed prototype (a274a9b0fa),
and then it got replicated into the C implementation later.
2020-06-26 16:16:55 -04:00
Samuel Neves 7ef795d62e Do not require AVX512DQ
Whereas vinserti64x4 is present on AVX512F, vinserti32x8 requires
AVX512DQ, which we do not test for. At this point there is not
much risk of incompatibility, since Skylake-X chips have all the
requires instruction sets, but let's be precise about this.
2020-04-12 11:38:11 +01:00
Samuel Neves eec458d03e move prototypes to shared header file, and make all local functions static. 2020-03-31 21:21:08 +01:00
Jack O'Connor e06a0f255a refactor the Cargo feature set
The biggest change here is that assembly implementations are enabled by
default.

Added features:
- "pure" (Pure Rust, with no C or assembly implementations.)

Removed features:
- "c" (Now basically the default.)

Renamed features;
- "c_prefer_intrinsics" -> "prefer_intrinsics"
- "c_neon" -> "neon"

Unchanged:
- "rayon"
- "std" (Still the only feature on by default.)
2020-03-29 18:02:03 -04:00
Samuel Neves 13556be388 save missing clobbered registers on Windows 2020-03-29 05:53:37 +01:00
Jack O'Connor c26a37f70c C files -> C and assembly files 2020-03-25 17:25:48 -04:00
Jack O'Connor c3639b4255 c/README.md changes
The C implementation now supports output seeking. Also expand the API
section a bit, and reorganize things to put the example on top.
2020-03-25 17:11:36 -04:00
Jack O'Connor a4ceef3932 add blake3_hasher_finalize_seek to the C API 2020-03-25 17:11:36 -04:00
Jack O'Connor 9d77bd6958 correct a comment 2020-03-17 14:26:39 -04:00
Jack O'Connor 48f2f745d9 clean up the C example a bit 2020-03-01 17:33:36 -05:00
Jack O'Connor 0432f9c7a3 some comment typos 2020-02-27 09:52:46 -05:00
Jack O'Connor 8d84cfc0af remove a mis-optimization that hurt performance for uneven updates
If the total number of chunks hashed so far is e.g. 1, and update() is
called with e.g. 8 more chunks, we can't compress all 8 together. We
have to break the input up, to make sure that that 1 lone chunk CV gets
merged with its proper sibling, and that in general the correct layout
of the tree is preserved. What we should do is hash 1-2-4-1 chunks of
input, using increasing powers of 2 (with some cleanup at the end). What
we were doing was 2-2-2-2 chunks. This was the result of a mistaken
optimization that got us stuck with an always-odd number of chunks so
far.

Fixes https://github.com/BLAKE3-team/BLAKE3/issues/69.
2020-02-25 11:40:37 -05:00
Samuel Neves 421a21abd8 Fix bug inadvertently introduced in a1c4c4efb5 2020-02-13 16:08:07 +00:00
Samuel Neves 207915a751 Work around GCC bug 85328 by forcing trivially masked stores.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85328

Fixes #58.
2020-02-13 15:22:17 +00:00
Samuel Neves fa6f14cafa Work around clang bug 36144 by replacing anonymous label numbers.
https://bugs.llvm.org/show_bug.cgi?id=36144

Fixes #60.
2020-02-13 15:22:17 +00:00
Jack O'Connor fcc14c8c1b more file renaming, use underscores more consistently 2020-02-12 18:41:41 -05:00
Erik Johansson 0281f1ae16 Rename assembly files (blake3-* -> blake3_*)
This gives the assembly files the same prefix as the intrinsics files which
simplifies building when the build system should pick between the assembly and
the intrinsics files.
2020-02-12 23:08:44 +01:00
Jack O'Connor 1c4d7fdd8d add test_asm to the C Makefile 2020-02-12 13:12:05 -05:00
Jack O'Connor 7ee05ba3bd document how to build the C code with assembly implementations 2020-02-12 13:04:03 -05:00
Jack O'Connor b8a1d2d982 integrate assembly implementations into blake3_c_rust_bindings 2020-02-12 10:23:17 -05:00
Samuel Neves b6b3c27824 assembly implementations 2020-02-12 10:23:17 -05:00
Samuel Neves a1c4c4efb5 Fix #51.
Thanks to bit4 for spotting this bug.
2020-02-02 18:47:38 +00:00
TheVice 58926046ca [MSVC] added possible to compile at Microsoft Visual C compiler.
[main.c] removed including of unistd.h from c/main.c file.
[blake3_avx2.c|blake3_avx512.c|blake3_sse41.c] resolved compile error:
'C4146' - applying of unary minus operator to the unsigned value.
2020-01-30 16:17:46 -05:00
Jack O'Connor 3c098eecc1 formating in c/README.md 2020-01-29 13:05:44 -05:00
Jack O'Connor af0ef07519 update the c/README.md example to hash stdin 2020-01-29 13:01:40 -05:00
Jack O'Connor 37e153cc60 add NEON support to blake3_dispatch.c
Currently this requires setting the BLAKE3_USE_NEON preprocessor flag.
In the future we may enable this automatically on AArch32/64 or include
some kind of dynamic feature detection. (Though ARM makes this harder
than x86.)

As part of this, get rid of the IS_ARM flag. It wasn't being set
properly when I tried it on a Raspberry Pi.

Closes #30.
2020-01-28 15:59:16 -05:00
Jack O'Connor d7a37fa54d clear errno before strtoull
I ran into a bug on ARM where we were getting non-zero here, from
something else that stuck around in error.
2020-01-28 14:11:26 -05:00
Jack O'Connor 4304cd1085 one more warning 2020-01-28 13:26:37 -05:00
Jack O'Connor d980514c44 fix unused variable warning 2020-01-28 13:25:22 -05:00
Jack O'Connor 6742722898 add a note about testing in main.c 2020-01-27 16:21:34 -05:00
TheVice 8ce1cddedc [memset] removed call of 'memset' function according to the overwriting
of it content inside of blake3_hasher_finalize function.
2020-01-27 16:17:09 -05:00
TheVice 4730ab237e [memset] placed function after checking of memory was done
on which it should be apply.
2020-01-27 16:17:09 -05:00
Jack O'Connor dec0c49576 add a note about AVX-512 flags 2020-01-27 13:10:25 -05:00
Jack O'Connor 444a338b45 remove an obsolete remark about performance 2020-01-27 13:04:36 -05:00
Jack O'Connor 71e605fd5d
typo 2020-01-26 16:12:10 -05:00
Jack O'Connor 1db856a3e5 expand the C README for public consumption 2020-01-26 16:07:51 -05:00
Erik Johansson 182aea4871 Add extern "C" to blake3.h
So that the header can be included in C++-programs without getting linker
errors.
2020-01-23 20:42:34 +01:00
Samuel Neves a830ab2661 streamline load_counters
avx2 before:

        mov     eax, esi
        neg     rax
        vmovq   xmm0, rax
        vpbroadcastq    ymm0, xmm0
        vpand   ymm0, ymm0, ymmword ptr [rip + .LCPI1_0]
        vmovq   xmm2, rdi
        vpbroadcastq    ymm1, xmm2
        vpaddq  ymm1, ymm0, ymm1
        vmovdqa ymm0, ymmword ptr [rip + .LCPI1_1] # ymm0 = [0,2,4,6,4,6,6,7]
        vpermd  ymm3, ymm0, ymm1
        mov     r8d, eax
        and     r8d, 5
        add     r8, rdi
        mov     esi, eax
        and     esi, 6
        add     rsi, rdi
        and     eax, 7
        vpshufd xmm4, xmm3, 231         # xmm4 = xmm3[3,1,2,3]
        vpinsrd xmm4, xmm4, r8d, 1
        add     rax, rdi
        vpinsrd xmm4, xmm4, esi, 2
        vpinsrd xmm4, xmm4, eax, 3
        vpshufd xmm3, xmm3, 144         # xmm3 = xmm3[0,0,1,2]
        vpinsrd xmm3, xmm3, edi, 0
        vmovdqa xmmword ptr [rdx], xmm3
        vmovdqa xmmword ptr [rdx + 16], xmm4
        vpermq  ymm3, ymm1, 144         # ymm3 = ymm1[0,0,1,2]
        vpblendd        ymm2, ymm3, ymm2, 3 # ymm2 = ymm2[0,1],ymm3[2,3,4,5,6,7]
        vpsrlq  ymm2, ymm2, 32
        vpermd  ymm2, ymm0, ymm2
        vextracti128    xmm1, ymm1, 1
        vmovq   xmm3, rax
        vmovq   xmm4, rsi
        vpunpcklqdq     xmm3, xmm4, xmm3 # xmm3 = xmm4[0],xmm3[0]
        vmovq   xmm4, r8
        vpalignr        xmm1, xmm4, xmm1, 8 # xmm1 = xmm1[8,9,10,11,12,13,14,15],xmm4[0,1,2,3,4,5,6,7]
        vinserti128     ymm1, ymm1, xmm3, 1
        vpsrlq  ymm1, ymm1, 32
        vpermd  ymm0, ymm0, ymm1

avx2 after:

        neg     esi
        vmovd   xmm0, esi
        vpbroadcastd    ymm0, xmm0
        vmovd   xmm1, edi
        vpbroadcastd    ymm1, xmm1
        vpand   ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
        vpaddd  ymm1, ymm1, ymm0
        vpbroadcastd    ymm2, dword ptr [rip + .LCPI0_1] # ymm2 = [2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648]
        vpor    ymm0, ymm0, ymm2
        vpxor   ymm2, ymm1, ymm2
        vpcmpgtd        ymm0, ymm0, ymm2
        shr     rdi, 32
        vmovd   xmm2, edi
        vpbroadcastd    ymm2, xmm2
        vpsubd  ymm0, ymm2, ymm0
2020-01-23 12:17:43 +00:00
Samuel Neves de1458c565 name collision 2020-01-23 11:51:46 +00:00
Samuel Neves 37ea737c16 more robust bit-trickery functions 2020-01-23 10:58:45 +00:00
Jack O'Connor 163f52245d port compress_subtree_to_parent_node from Rust to C
This recursive function performs parallel parent node hashing, which is
an important optimization.
2020-01-22 21:32:39 -05:00
Jack O'Connor de1cf0038e add the round_down_to_power_of_2 algoirthm
This could probably be sped up by detecting LZCNT support, but it's
unlikely to be a bottleneck.
2020-01-22 21:32:39 -05:00
Jack O'Connor 087d72e08f clang-format 2020-01-22 21:32:35 -05:00
Jack O'Connor 92d421dea1 add a larger test case
One thing I like to test is that, if I hack simd_degree to be higher
than MAX_SIMD_DEGREE, assertions fire. This requires a test case long
enough to exceed that number of chunks.
2020-01-22 21:19:47 -05:00
Jack O'Connor d0c8fc16b3 use a better popcnt fallback algorithm
This one loops once for every set bit, rather than once for each bit
position to the right of the highest set bit.

https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation
2020-01-21 10:47:00 -05:00