Jack O'Connor
f4a28dc21a
bench_just_kernel2
2022-12-17 00:17:27 -08:00
Jack O'Connor
4f502617a6
try full transposition
2022-11-23 17:01:48 -08:00
Jack O'Connor
f1ac4cf06b
missing inlines
2022-11-23 16:36:04 -08:00
Jack O'Connor
c916f6d463
kernel2 benches
2022-11-23 16:35:21 -08:00
Jack O'Connor
a08a0f05ab
xor_xof_16
2022-11-23 15:48:03 -08:00
Jack O'Connor
7562b7c4dc
correct the counter values
2022-11-23 15:08:38 -08:00
Jack O'Connor
1ef99db193
WIP i don't remember what this is
2022-11-21 13:23:20 -08:00
Jack O'Connor
0ab6dbcc47
WIP kernel2
2022-10-10 15:25:36 -07:00
Jack O'Connor
e17743e8fd
kernel_3d_16 and xof functions
2022-04-09 13:31:19 -07:00
Jack O'Connor
35ad4ededd
xor_xof variants for the 2d kernel
2022-03-26 11:18:39 -04:00
Jack O'Connor
ea94b544fc
blake3_avx512_xof_stream_4
2022-03-20 20:17:31 -04:00
Jack O'Connor
9139fa40e8
blake3_avx2_xof_stream_2
2022-03-20 18:35:41 -04:00
Jack O'Connor
39ee6f4868
blake3_avx512_xof_stream_2
2022-03-20 18:26:02 -04:00
Jack O'Connor
08288c73bd
initial xof_stream functions
2022-03-20 16:15:12 -04:00
Jack O'Connor
18962919e9
add some comments
2022-03-20 11:19:35 -04:00
Jack O'Connor
0e8a65d9ad
rename kernel_1 to kernel2d_1 and add degree args
2022-03-16 16:25:54 -04:00
Jack O'Connor
ee558b2f32
generate blake3_{avx512,sse41,sse2}_compress with asm.py
2022-03-15 14:03:02 -04:00
Jack O'Connor
2e5eb837e5
replace tail calls with jumps
2022-03-11 01:49:50 -05:00
Jack O'Connor
328542d837
blake3_avx512_chunks_8 and blake3_avx512_parents_8
2022-03-11 00:22:12 -05:00
Jack O'Connor
2156d05d4d
blake3_avx512_xof_xor_16
2022-03-09 15:15:37 -05:00
Jack O'Connor
ffed5c5605
test unaligned writes
2022-03-09 14:39:24 -05:00
Jack O'Connor
09c2b9141c
broadcast the block length and domain flags inside blake3_avx512_kernel_16
...
blake3_avx512_xof_stream_16 was also incorrectly hardcoding a block
length of 64. The block length parameter is the *input* block length,
which is independent of the output block length. (The output block
length is not a compression function parameter.)
2022-03-09 12:19:14 -05:00
Jack O'Connor
506ae0b0fe
move third row initialization into blake3_avx512_kernel_16
2022-03-09 11:21:13 -05:00
Jack O'Connor
deac825436
interleave the write ops in blake3_avx512_xor_stream_16
...
This seems to give a small but consistent performance boost.
2022-03-09 00:56:09 -05:00
Jack O'Connor
4c929ddac1
blake3_avx512_xof_stream_16
2022-03-09 00:29:37 -05:00
Jack O'Connor
5d46559201
split the left and right child CVs for blake3_avx512_parents_16
...
There's no reason to force the caller to allocate them together.
2022-03-08 22:41:27 -05:00
Jack O'Connor
4e8ae445c4
blake3_avx512_parents_16
2022-03-08 22:23:09 -05:00
Jack O'Connor
ec669de03e
use a memory argument for vpbroadcastd
2022-03-08 22:23:09 -05:00
Jack O'Connor
9fdea0db7c
describe the transposition in comments
2022-03-08 22:23:09 -05:00
Jack O'Connor
bcbbcc8d2c
now using only 3 scratch zmm registers
2022-03-08 22:23:09 -05:00
Jack O'Connor
78b8e87f91
interleave the first pass -- good performance
2022-03-08 22:23:09 -05:00
Jack O'Connor
87a9318233
try it with 4 times as many loads
2022-03-08 22:23:09 -05:00
Jack O'Connor
d9b803304c
add a benchmark
2022-03-08 22:23:09 -05:00
Jack O'Connor
e4397683ef
blake3_avx512_chunks_16
2022-03-08 22:23:09 -05:00
Jack O'Connor
3f066236ad
unroll the block loop and load the key
2022-03-08 22:23:09 -05:00
Jack O'Connor
0421fb1b00
correct the last two transposition passes
2022-03-08 22:23:09 -05:00
Jack O'Connor
67f5307c38
nonzero message
2022-03-08 22:23:09 -05:00
Jack O'Connor
f61d003953
start working on a refactored assembly implementation
...
The main goal is to eventually have extended outputs benefit from the
same SIMD optimizations as inputs. To make this easier, I want to factor
out a shared "kernel" routine that can be shared among several
different interfaces:
- compressing chunks
- compressing parents
- producing XOF output
- xor'ing XOF output
The timing here partly coincides with Rust stabilizing inline asm.
That's certainly not necessary for any of this to work, but it gives me
the confidence to try this without needing to master the rules of three
different calling conventions.
2022-03-08 22:09:07 -05:00
Jack O'Connor
9cd41c0cfd
link to reference impl ports from the main readme too
2022-03-05 14:52:28 -05:00
Jack O'Connor
039f8cdc20
link to ports of the reference implementation
2022-03-04 21:02:18 -05:00
Jack O'Connor
48c4621edc
add "(if any)" regarding keying in the security notes
2022-03-04 10:19:14 -05:00
Jack O'Connor
3e67a8f45b
correct the security notes for the C API
2022-03-03 12:06:14 -05:00
Jack O'Connor
d295410aad
simplify a bit more
2022-03-03 11:52:58 -05:00
Jack O'Connor
b3c06e46ed
simplify the security notes, avoid referring to entropy
2022-03-02 19:05:15 -05:00
Jack O'Connor
153d46e11a
copy the same notes to the C docs
2022-03-02 17:55:05 -05:00
Jack O'Connor
ea3bc782d8
document the extended output security issue found by Aldo Gunsing
...
https://eprint.iacr.org/2022/283
2022-03-02 17:39:25 -05:00
Jack O'Connor
4e84c8c7ae
version 1.3.1
...
Changes since 1.3.0:
- The unstable `traits-preview` feature now includes an implementation
of `crypto_common::BlockSizeUser`, AKA
`digest::core_api::BlockSizeUser`. This allows `blake3::Hasher` to be
used with `hmac::SimpleHmac`.
1.3.1
2022-01-25 12:02:56 -05:00
Jack O'Connor
15447749ef
add a release checklist
2022-01-25 12:02:56 -05:00
Jack O'Connor
540f708a94
check the HMAC output bytes
2022-01-24 20:52:22 -05:00
jbis9051
509e97ed90
Adds test
2022-01-24 19:29:33 -05:00