1
0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-09-21 08:11:36 +02:00
Commit Graph

267 Commits

Author SHA1 Message Date
Jack O'Connor
24071db346 re-export digest and crypto_mac 2020-02-04 10:02:46 -05:00
Jack O'Connor
0c663aa8ac add a link in the README to bar_chart.py
Closes https://github.com/BLAKE3-team/BLAKE3/issues/53.
2020-02-04 09:40:10 -05:00
Cesar Eduardo Barros
a3d42f724d Inline wrapper methods 2020-02-03 17:29:25 -05:00
Jack O'Connor
0de4412884 version 0.1.4
Changes since 0.1.3:
- Hasher supports the reset() method.
- Hasher implements several traits from the `digest` and `crypto_mac`
  crates.
- Bug fixes in the C implementation for MSVC and for 32-bit x86.
2020-02-03 12:05:26 -05:00
Jack O'Connor
0651736ff4 make the inherent reset() method return &mut self 2020-02-03 10:21:27 -05:00
Jack O'Connor
9ffe377d45 implement crypto_mac::Mac 2020-02-03 10:18:02 -05:00
Jack O'Connor
bcd424cab6 mention the digest traits in the docs 2020-02-02 17:40:30 -05:00
Jack O'Connor
9bab77d2cf implement traits from the digest crate 2020-02-02 17:28:22 -05:00
Jack O'Connor
e603983647 add Hasher::reset
Closes https://github.com/BLAKE3-team/BLAKE3/issues/41.
2020-02-02 16:38:29 -05:00
Samuel Neves
a1c4c4efb5 Fix #51.
Thanks to bit4 for spotting this bug.
2020-02-02 18:47:38 +00:00
TheVice
58926046ca [MSVC] added possible to compile at Microsoft Visual C compiler.
[main.c] removed including of unistd.h from c/main.c file.
[blake3_avx2.c|blake3_avx512.c|blake3_sse41.c] resolved compile error:
'C4146' - applying of unary minus operator to the unsigned value.
2020-01-30 16:17:46 -05:00
Jack O'Connor
3c098eecc1 formating in c/README.md 2020-01-29 13:05:44 -05:00
Jack O'Connor
af0ef07519 update the c/README.md example to hash stdin 2020-01-29 13:01:40 -05:00
Jack O'Connor
37e153cc60 add NEON support to blake3_dispatch.c
Currently this requires setting the BLAKE3_USE_NEON preprocessor flag.
In the future we may enable this automatically on AArch32/64 or include
some kind of dynamic feature detection. (Though ARM makes this harder
than x86.)

As part of this, get rid of the IS_ARM flag. It wasn't being set
properly when I tried it on a Raspberry Pi.

Closes #30.
2020-01-28 15:59:16 -05:00
Jack O'Connor
d7a37fa54d clear errno before strtoull
I ran into a bug on ARM where we were getting non-zero here, from
something else that stuck around in error.
2020-01-28 14:11:26 -05:00
Jack O'Connor
4304cd1085 one more warning 2020-01-28 13:26:37 -05:00
Jack O'Connor
d980514c44 fix unused variable warning 2020-01-28 13:25:22 -05:00
Jack O'Connor
6742722898 add a note about testing in main.c 2020-01-27 16:21:34 -05:00
TheVice
8ce1cddedc [memset] removed call of 'memset' function according to the overwriting
of it content inside of blake3_hasher_finalize function.
2020-01-27 16:17:09 -05:00
TheVice
4730ab237e [memset] placed function after checking of memory was done
on which it should be apply.
2020-01-27 16:17:09 -05:00
Jack O'Connor
dec0c49576 add a note about AVX-512 flags 2020-01-27 13:10:25 -05:00
Jack O'Connor
444a338b45 remove an obsolete remark about performance 2020-01-27 13:04:36 -05:00
Jack O'Connor
5ef22de9d0 link to the C implementation from the README 2020-01-27 13:02:00 -05:00
Jack O'Connor
71e605fd5d
typo 2020-01-26 16:12:10 -05:00
Jack O'Connor
1db856a3e5 expand the C README for public consumption 2020-01-26 16:07:51 -05:00
Samuel Neves
214c70d8f3
Merge pull request #40 from erijo/cpp
Add extern "C" to blake3.h
2020-01-24 00:42:41 +00:00
Erik Johansson
182aea4871 Add extern "C" to blake3.h
So that the header can be included in C++-programs without getting linker
errors.
2020-01-23 20:42:34 +01:00
Samuel Neves
a830ab2661 streamline load_counters
avx2 before:

        mov     eax, esi
        neg     rax
        vmovq   xmm0, rax
        vpbroadcastq    ymm0, xmm0
        vpand   ymm0, ymm0, ymmword ptr [rip + .LCPI1_0]
        vmovq   xmm2, rdi
        vpbroadcastq    ymm1, xmm2
        vpaddq  ymm1, ymm0, ymm1
        vmovdqa ymm0, ymmword ptr [rip + .LCPI1_1] # ymm0 = [0,2,4,6,4,6,6,7]
        vpermd  ymm3, ymm0, ymm1
        mov     r8d, eax
        and     r8d, 5
        add     r8, rdi
        mov     esi, eax
        and     esi, 6
        add     rsi, rdi
        and     eax, 7
        vpshufd xmm4, xmm3, 231         # xmm4 = xmm3[3,1,2,3]
        vpinsrd xmm4, xmm4, r8d, 1
        add     rax, rdi
        vpinsrd xmm4, xmm4, esi, 2
        vpinsrd xmm4, xmm4, eax, 3
        vpshufd xmm3, xmm3, 144         # xmm3 = xmm3[0,0,1,2]
        vpinsrd xmm3, xmm3, edi, 0
        vmovdqa xmmword ptr [rdx], xmm3
        vmovdqa xmmword ptr [rdx + 16], xmm4
        vpermq  ymm3, ymm1, 144         # ymm3 = ymm1[0,0,1,2]
        vpblendd        ymm2, ymm3, ymm2, 3 # ymm2 = ymm2[0,1],ymm3[2,3,4,5,6,7]
        vpsrlq  ymm2, ymm2, 32
        vpermd  ymm2, ymm0, ymm2
        vextracti128    xmm1, ymm1, 1
        vmovq   xmm3, rax
        vmovq   xmm4, rsi
        vpunpcklqdq     xmm3, xmm4, xmm3 # xmm3 = xmm4[0],xmm3[0]
        vmovq   xmm4, r8
        vpalignr        xmm1, xmm4, xmm1, 8 # xmm1 = xmm1[8,9,10,11,12,13,14,15],xmm4[0,1,2,3,4,5,6,7]
        vinserti128     ymm1, ymm1, xmm3, 1
        vpsrlq  ymm1, ymm1, 32
        vpermd  ymm0, ymm0, ymm1

avx2 after:

        neg     esi
        vmovd   xmm0, esi
        vpbroadcastd    ymm0, xmm0
        vmovd   xmm1, edi
        vpbroadcastd    ymm1, xmm1
        vpand   ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
        vpaddd  ymm1, ymm1, ymm0
        vpbroadcastd    ymm2, dword ptr [rip + .LCPI0_1] # ymm2 = [2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648]
        vpor    ymm0, ymm0, ymm2
        vpxor   ymm2, ymm1, ymm2
        vpcmpgtd        ymm0, ymm0, ymm2
        shr     rdi, 32
        vmovd   xmm2, edi
        vpbroadcastd    ymm2, xmm2
        vpsubd  ymm0, ymm2, ymm0
2020-01-23 12:17:43 +00:00
Samuel Neves
de1458c565 name collision 2020-01-23 11:51:46 +00:00
Samuel Neves
37ea737c16 more robust bit-trickery functions 2020-01-23 10:58:45 +00:00
Jack O'Connor
e17c45ddd5 version 0.1.3
Changes since 0.1.2:
- All x86 implementations include _mm_prefetch optimizations. These
  improve performance for very large inputs.
- The C implementation performs parallel parent hashing, matching the
  performance of the single-threaded Rust implementation.
- b3sum supports --no-mmap. Contributed by @cesarb.
2020-01-22 21:35:24 -05:00
Jack O'Connor
163f52245d port compress_subtree_to_parent_node from Rust to C
This recursive function performs parallel parent node hashing, which is
an important optimization.
2020-01-22 21:32:39 -05:00
Jack O'Connor
de1cf0038e add the round_down_to_power_of_2 algoirthm
This could probably be sped up by detecting LZCNT support, but it's
unlikely to be a bottleneck.
2020-01-22 21:32:39 -05:00
Jack O'Connor
087d72e08f clang-format 2020-01-22 21:32:35 -05:00
Jack O'Connor
92d421dea1 add a larger test case
One thing I like to test is that, if I hack simd_degree to be higher
than MAX_SIMD_DEGREE, assertions fire. This requires a test case long
enough to exceed that number of chunks.
2020-01-22 21:19:47 -05:00
Jack O'Connor
78e858d050 expand comments about lazy merging 2020-01-21 12:09:42 -05:00
Jack O'Connor
ccadbad244 stack size in the optimized impl should be MAX_DEPTH + 1 2020-01-21 11:41:20 -05:00
Jack O'Connor
d0c8fc16b3 use a better popcnt fallback algorithm
This one loops once for every set bit, rather than once for each bit
position to the right of the highest set bit.

https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation
2020-01-21 10:47:00 -05:00
Jack O'Connor
67262dff31 double the maximum incremental subtree size
Because compress_subtree_to_parent_node effectively cuts its input in
half, we can give it an input that's twice as big, without violating the
CV stack invariant.
2020-01-20 19:25:55 -05:00
Jack O'Connor
4a92e8eeb1 add the reference impl doc test to CI 2020-01-20 16:36:30 -05:00
Jack O'Connor
4021636022 test the BLAKE3_NO_* vars in CI 2020-01-20 16:19:16 -05:00
Jack O'Connor
40f4bdc22a switch from BLAKE3_USE_* to BLAKE3_NO_*
This means that compiling C sources includes all implementations by
default, which is what most callers are going to want.
2020-01-20 15:24:03 -05:00
Samuel Neves
66da5afb0c make things more modular 2020-01-20 12:03:31 -05:00
Jack O'Connor
491f799fd9 clarify the --no-mmap logic a bit 2020-01-20 12:03:31 -05:00
Cesar Eduardo Barros
273a679ddc b3sum: add no-mmap option
Using mmap is not always the best option. For instance, if the file is
truncated while being read, b3sum will receive a SIGBUS and abort.

Follow ripgrep's lead and add a --no-mmap option to disable mmap. This
can also help benchmark the mmap versus the read path, and help debug
performance issues potentially caused by mmap access patterns (like
issue #32).
2020-01-20 11:58:07 -05:00
Samuel Neves
b8c33e11ef manually prefetch message blocks 2020-01-19 18:45:37 +00:00
Jack O'Connor
a3147eb909 comment about parallelism 2020-01-18 14:32:52 -05:00
Jack O'Connor
14cd5c51c4 version 0.1.2
Changes since 0.1.1:
- b3sum no longer mmaps files smaller than 16 KiB. This improves
  performance for hashing many small files. Contributed by @xzfc.
- b3sum now supports --raw output. Contributed by @phayes.
2020-01-17 13:58:55 -05:00
Jack O'Connor
7ee89fe738 update b3sum help text in README.md 2020-01-17 13:54:58 -05:00
Jack O'Connor
e2ce07601f edit the --raw help string 2020-01-17 13:36:09 -05:00