1
0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-11-15 13:43:08 +01:00
Commit Graph

508 Commits

Author SHA1 Message Date
Jack O'Connor
f4a28dc21a bench_just_kernel2 2022-12-17 00:17:27 -08:00
Jack O'Connor
4f502617a6 try full transposition 2022-11-23 17:01:48 -08:00
Jack O'Connor
f1ac4cf06b missing inlines 2022-11-23 16:36:04 -08:00
Jack O'Connor
c916f6d463 kernel2 benches 2022-11-23 16:35:21 -08:00
Jack O'Connor
a08a0f05ab xor_xof_16 2022-11-23 15:48:03 -08:00
Jack O'Connor
7562b7c4dc correct the counter values 2022-11-23 15:08:38 -08:00
Jack O'Connor
1ef99db193 WIP i don't remember what this is 2022-11-21 13:23:20 -08:00
Jack O'Connor
0ab6dbcc47 WIP kernel2 2022-10-10 15:25:36 -07:00
Jack O'Connor
e17743e8fd kernel_3d_16 and xof functions 2022-04-09 13:31:19 -07:00
Jack O'Connor
35ad4ededd xor_xof variants for the 2d kernel 2022-03-26 11:18:39 -04:00
Jack O'Connor
ea94b544fc blake3_avx512_xof_stream_4 2022-03-20 20:17:31 -04:00
Jack O'Connor
9139fa40e8 blake3_avx2_xof_stream_2 2022-03-20 18:35:41 -04:00
Jack O'Connor
39ee6f4868 blake3_avx512_xof_stream_2 2022-03-20 18:26:02 -04:00
Jack O'Connor
08288c73bd initial xof_stream functions 2022-03-20 16:15:12 -04:00
Jack O'Connor
18962919e9 add some comments 2022-03-20 11:19:35 -04:00
Jack O'Connor
0e8a65d9ad rename kernel_1 to kernel2d_1 and add degree args 2022-03-16 16:25:54 -04:00
Jack O'Connor
ee558b2f32 generate blake3_{avx512,sse41,sse2}_compress with asm.py 2022-03-15 14:03:02 -04:00
Jack O'Connor
2e5eb837e5 replace tail calls with jumps 2022-03-11 01:49:50 -05:00
Jack O'Connor
328542d837 blake3_avx512_chunks_8 and blake3_avx512_parents_8 2022-03-11 00:22:12 -05:00
Jack O'Connor
2156d05d4d blake3_avx512_xof_xor_16 2022-03-09 15:15:37 -05:00
Jack O'Connor
ffed5c5605 test unaligned writes 2022-03-09 14:39:24 -05:00
Jack O'Connor
09c2b9141c broadcast the block length and domain flags inside blake3_avx512_kernel_16
blake3_avx512_xof_stream_16 was also incorrectly hardcoding a block
length of 64. The block length parameter is the *input* block length,
which is independent of the output block length. (The output block
length is not a compression function parameter.)
2022-03-09 12:19:14 -05:00
Jack O'Connor
506ae0b0fe move third row initialization into blake3_avx512_kernel_16 2022-03-09 11:21:13 -05:00
Jack O'Connor
deac825436 interleave the write ops in blake3_avx512_xor_stream_16
This seems to give a small but consistent performance boost.
2022-03-09 00:56:09 -05:00
Jack O'Connor
4c929ddac1 blake3_avx512_xof_stream_16 2022-03-09 00:29:37 -05:00
Jack O'Connor
5d46559201 split the left and right child CVs for blake3_avx512_parents_16
There's no reason to force the caller to allocate them together.
2022-03-08 22:41:27 -05:00
Jack O'Connor
4e8ae445c4 blake3_avx512_parents_16 2022-03-08 22:23:09 -05:00
Jack O'Connor
ec669de03e use a memory argument for vpbroadcastd 2022-03-08 22:23:09 -05:00
Jack O'Connor
9fdea0db7c describe the transposition in comments 2022-03-08 22:23:09 -05:00
Jack O'Connor
bcbbcc8d2c now using only 3 scratch zmm registers 2022-03-08 22:23:09 -05:00
Jack O'Connor
78b8e87f91 interleave the first pass -- good performance 2022-03-08 22:23:09 -05:00
Jack O'Connor
87a9318233 try it with 4 times as many loads 2022-03-08 22:23:09 -05:00
Jack O'Connor
d9b803304c add a benchmark 2022-03-08 22:23:09 -05:00
Jack O'Connor
e4397683ef blake3_avx512_chunks_16 2022-03-08 22:23:09 -05:00
Jack O'Connor
3f066236ad unroll the block loop and load the key 2022-03-08 22:23:09 -05:00
Jack O'Connor
0421fb1b00 correct the last two transposition passes 2022-03-08 22:23:09 -05:00
Jack O'Connor
67f5307c38 nonzero message 2022-03-08 22:23:09 -05:00
Jack O'Connor
f61d003953 start working on a refactored assembly implementation
The main goal is to eventually have extended outputs benefit from the
same SIMD optimizations as inputs. To make this easier, I want to factor
out a shared "kernel" routine that can be shared among several
different interfaces:

- compressing chunks
- compressing parents
- producing XOF output
- xor'ing XOF output

The timing here partly coincides with Rust stabilizing inline asm.
That's certainly not necessary for any of this to work, but it gives me
the confidence to try this without needing to master the rules of three
different calling conventions.
2022-03-08 22:09:07 -05:00
Jack O'Connor
9cd41c0cfd link to reference impl ports from the main readme too 2022-03-05 14:52:28 -05:00
Jack O'Connor
039f8cdc20 link to ports of the reference implementation 2022-03-04 21:02:18 -05:00
Jack O'Connor
48c4621edc add "(if any)" regarding keying in the security notes 2022-03-04 10:19:14 -05:00
Jack O'Connor
3e67a8f45b correct the security notes for the C API 2022-03-03 12:06:14 -05:00
Jack O'Connor
d295410aad simplify a bit more 2022-03-03 11:52:58 -05:00
Jack O'Connor
b3c06e46ed simplify the security notes, avoid referring to entropy 2022-03-02 19:05:15 -05:00
Jack O'Connor
153d46e11a copy the same notes to the C docs 2022-03-02 17:55:05 -05:00
Jack O'Connor
ea3bc782d8 document the extended output security issue found by Aldo Gunsing
https://eprint.iacr.org/2022/283
2022-03-02 17:39:25 -05:00
Jack O'Connor
4e84c8c7ae version 1.3.1
Changes since 1.3.0:
- The unstable `traits-preview` feature now includes an implementation
  of `crypto_common::BlockSizeUser`, AKA
  `digest::core_api::BlockSizeUser`. This allows `blake3::Hasher` to be
  used with `hmac::SimpleHmac`.
1.3.1
2022-01-25 12:02:56 -05:00
Jack O'Connor
15447749ef add a release checklist 2022-01-25 12:02:56 -05:00
Jack O'Connor
540f708a94 check the HMAC output bytes 2022-01-24 20:52:22 -05:00
jbis9051
509e97ed90 Adds test 2022-01-24 19:29:33 -05:00