1
0
Fork 0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-05-23 08:46:06 +02:00
Commit Graph

203 Commits

Author SHA1 Message Date
Jack O'Connor d7a37fa54d clear errno before strtoull
I ran into a bug on ARM where we were getting non-zero here, from
something else that stuck around in error.
2020-01-28 14:11:26 -05:00
Jack O'Connor 4304cd1085 one more warning 2020-01-28 13:26:37 -05:00
Jack O'Connor d980514c44 fix unused variable warning 2020-01-28 13:25:22 -05:00
Jack O'Connor 6742722898 add a note about testing in main.c 2020-01-27 16:21:34 -05:00
TheVice 8ce1cddedc [memset] removed call of 'memset' function according to the overwriting
of it content inside of blake3_hasher_finalize function.
2020-01-27 16:17:09 -05:00
TheVice 4730ab237e [memset] placed function after checking of memory was done
on which it should be apply.
2020-01-27 16:17:09 -05:00
Jack O'Connor dec0c49576 add a note about AVX-512 flags 2020-01-27 13:10:25 -05:00
Jack O'Connor 444a338b45 remove an obsolete remark about performance 2020-01-27 13:04:36 -05:00
Jack O'Connor 5ef22de9d0 link to the C implementation from the README 2020-01-27 13:02:00 -05:00
Jack O'Connor 71e605fd5d
typo 2020-01-26 16:12:10 -05:00
Jack O'Connor 1db856a3e5 expand the C README for public consumption 2020-01-26 16:07:51 -05:00
Samuel Neves 214c70d8f3
Merge pull request #40 from erijo/cpp
Add extern "C" to blake3.h
2020-01-24 00:42:41 +00:00
Erik Johansson 182aea4871 Add extern "C" to blake3.h
So that the header can be included in C++-programs without getting linker
errors.
2020-01-23 20:42:34 +01:00
Samuel Neves a830ab2661 streamline load_counters
avx2 before:

        mov     eax, esi
        neg     rax
        vmovq   xmm0, rax
        vpbroadcastq    ymm0, xmm0
        vpand   ymm0, ymm0, ymmword ptr [rip + .LCPI1_0]
        vmovq   xmm2, rdi
        vpbroadcastq    ymm1, xmm2
        vpaddq  ymm1, ymm0, ymm1
        vmovdqa ymm0, ymmword ptr [rip + .LCPI1_1] # ymm0 = [0,2,4,6,4,6,6,7]
        vpermd  ymm3, ymm0, ymm1
        mov     r8d, eax
        and     r8d, 5
        add     r8, rdi
        mov     esi, eax
        and     esi, 6
        add     rsi, rdi
        and     eax, 7
        vpshufd xmm4, xmm3, 231         # xmm4 = xmm3[3,1,2,3]
        vpinsrd xmm4, xmm4, r8d, 1
        add     rax, rdi
        vpinsrd xmm4, xmm4, esi, 2
        vpinsrd xmm4, xmm4, eax, 3
        vpshufd xmm3, xmm3, 144         # xmm3 = xmm3[0,0,1,2]
        vpinsrd xmm3, xmm3, edi, 0
        vmovdqa xmmword ptr [rdx], xmm3
        vmovdqa xmmword ptr [rdx + 16], xmm4
        vpermq  ymm3, ymm1, 144         # ymm3 = ymm1[0,0,1,2]
        vpblendd        ymm2, ymm3, ymm2, 3 # ymm2 = ymm2[0,1],ymm3[2,3,4,5,6,7]
        vpsrlq  ymm2, ymm2, 32
        vpermd  ymm2, ymm0, ymm2
        vextracti128    xmm1, ymm1, 1
        vmovq   xmm3, rax
        vmovq   xmm4, rsi
        vpunpcklqdq     xmm3, xmm4, xmm3 # xmm3 = xmm4[0],xmm3[0]
        vmovq   xmm4, r8
        vpalignr        xmm1, xmm4, xmm1, 8 # xmm1 = xmm1[8,9,10,11,12,13,14,15],xmm4[0,1,2,3,4,5,6,7]
        vinserti128     ymm1, ymm1, xmm3, 1
        vpsrlq  ymm1, ymm1, 32
        vpermd  ymm0, ymm0, ymm1

avx2 after:

        neg     esi
        vmovd   xmm0, esi
        vpbroadcastd    ymm0, xmm0
        vmovd   xmm1, edi
        vpbroadcastd    ymm1, xmm1
        vpand   ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
        vpaddd  ymm1, ymm1, ymm0
        vpbroadcastd    ymm2, dword ptr [rip + .LCPI0_1] # ymm2 = [2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648]
        vpor    ymm0, ymm0, ymm2
        vpxor   ymm2, ymm1, ymm2
        vpcmpgtd        ymm0, ymm0, ymm2
        shr     rdi, 32
        vmovd   xmm2, edi
        vpbroadcastd    ymm2, xmm2
        vpsubd  ymm0, ymm2, ymm0
2020-01-23 12:17:43 +00:00
Samuel Neves de1458c565 name collision 2020-01-23 11:51:46 +00:00
Samuel Neves 37ea737c16 more robust bit-trickery functions 2020-01-23 10:58:45 +00:00
Jack O'Connor e17c45ddd5 version 0.1.3
Changes since 0.1.2:
- All x86 implementations include _mm_prefetch optimizations. These
  improve performance for very large inputs.
- The C implementation performs parallel parent hashing, matching the
  performance of the single-threaded Rust implementation.
- b3sum supports --no-mmap. Contributed by @cesarb.
2020-01-22 21:35:24 -05:00
Jack O'Connor 163f52245d port compress_subtree_to_parent_node from Rust to C
This recursive function performs parallel parent node hashing, which is
an important optimization.
2020-01-22 21:32:39 -05:00
Jack O'Connor de1cf0038e add the round_down_to_power_of_2 algoirthm
This could probably be sped up by detecting LZCNT support, but it's
unlikely to be a bottleneck.
2020-01-22 21:32:39 -05:00
Jack O'Connor 087d72e08f clang-format 2020-01-22 21:32:35 -05:00
Jack O'Connor 92d421dea1 add a larger test case
One thing I like to test is that, if I hack simd_degree to be higher
than MAX_SIMD_DEGREE, assertions fire. This requires a test case long
enough to exceed that number of chunks.
2020-01-22 21:19:47 -05:00
Jack O'Connor 78e858d050 expand comments about lazy merging 2020-01-21 12:09:42 -05:00
Jack O'Connor ccadbad244 stack size in the optimized impl should be MAX_DEPTH + 1 2020-01-21 11:41:20 -05:00
Jack O'Connor d0c8fc16b3 use a better popcnt fallback algorithm
This one loops once for every set bit, rather than once for each bit
position to the right of the highest set bit.

https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation
2020-01-21 10:47:00 -05:00
Jack O'Connor 67262dff31 double the maximum incremental subtree size
Because compress_subtree_to_parent_node effectively cuts its input in
half, we can give it an input that's twice as big, without violating the
CV stack invariant.
2020-01-20 19:25:55 -05:00
Jack O'Connor 4a92e8eeb1 add the reference impl doc test to CI 2020-01-20 16:36:30 -05:00
Jack O'Connor 4021636022 test the BLAKE3_NO_* vars in CI 2020-01-20 16:19:16 -05:00
Jack O'Connor 40f4bdc22a switch from BLAKE3_USE_* to BLAKE3_NO_*
This means that compiling C sources includes all implementations by
default, which is what most callers are going to want.
2020-01-20 15:24:03 -05:00
Samuel Neves 66da5afb0c make things more modular 2020-01-20 12:03:31 -05:00
Jack O'Connor 491f799fd9 clarify the --no-mmap logic a bit 2020-01-20 12:03:31 -05:00
Cesar Eduardo Barros 273a679ddc b3sum: add no-mmap option
Using mmap is not always the best option. For instance, if the file is
truncated while being read, b3sum will receive a SIGBUS and abort.

Follow ripgrep's lead and add a --no-mmap option to disable mmap. This
can also help benchmark the mmap versus the read path, and help debug
performance issues potentially caused by mmap access patterns (like
issue #32).
2020-01-20 11:58:07 -05:00
Samuel Neves b8c33e11ef manually prefetch message blocks 2020-01-19 18:45:37 +00:00
Jack O'Connor a3147eb909 comment about parallelism 2020-01-18 14:32:52 -05:00
Jack O'Connor 14cd5c51c4 version 0.1.2
Changes since 0.1.1:
- b3sum no longer mmaps files smaller than 16 KiB. This improves
  performance for hashing many small files. Contributed by @xzfc.
- b3sum now supports --raw output. Contributed by @phayes.
2020-01-17 13:58:55 -05:00
Jack O'Connor 7ee89fe738 update b3sum help text in README.md 2020-01-17 13:54:58 -05:00
Jack O'Connor e2ce07601f edit the --raw help string 2020-01-17 13:36:09 -05:00
Jack O'Connor 2db9f2d2ea
Merge pull request #22 from phayes/raw_output
Adds support for raw output to b3sum
2020-01-17 13:29:39 -05:00
Albert Safin f26880e282 b3sum: do not mmap files smaller than 16 KiB 2020-01-17 12:58:32 -05:00
Jack O'Connor 28701d1585 add a README.md in c/blake3_c_rust_bindings 2020-01-16 18:29:20 -05:00
Jack O'Connor 84c26670bf add blake3_c_rust_bindings for testing and benchmarking 2020-01-16 16:09:42 -05:00
Jack O'Connor 33a9bee51f update the b3sum README 2020-01-15 10:46:47 -05:00
Jack O'Connor e60934a129 more consistent use of Self in the reference impl 2020-01-15 10:41:06 -05:00
phayes aec1d88e31
Using take() to limit the number of bytes copies 2020-01-14 14:35:18 -08:00
Jack O'Connor c8c442a99b add comments to the reference impl 2020-01-14 15:22:22 -05:00
phayes a02b4cb040
bailing early if we have both --raw and multiple files 2020-01-13 14:56:06 -08:00
phayes 0e8734b7f6
Making sure our raw multi-file test is testing what we think it is 2020-01-13 14:48:24 -08:00
phayes 5cb01ad696
Using stdout_capture for capturing stdout that is not a string 2020-01-13 14:43:09 -08:00
phayes 2bd7614d1e
Fixing stdout locking 2020-01-13 14:40:30 -08:00
phayes ec1233bca3
Locking stdout for writing in a tight loop. 2020-01-13 14:36:28 -08:00
phayes 8d251af29f
Adds support for raw output to b3sum 2020-01-13 13:12:47 -08:00