1
0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-11-08 12:59:17 +01:00
BLAKE3/rust/guts
2024-01-21 20:10:46 -08:00
..
src add a guts docs example 2024-01-21 20:10:46 -08:00
Cargo.toml factor out just the portable parts of the guts_api branch 2024-01-21 18:38:06 -08:00
readme.md guts readme updates 2024-01-21 19:43:07 -08:00

The BLAKE3 Guts API

Introduction

This blake3_guts sub-crate contains low-level, high-performance, platform-specific implementations of the BLAKE3 compression function. This API is complicated and unsafe, and this crate will never have a stable release. Most callers should instead use the blake3 crate, which will eventually depend on this one internally.

The code you see here (as of January 2024) is an early stage of a large planned refactor. The motivation for this refactor is a couple of missing features in both the Rust and C implementations:

  • The output side (OutputReader in Rust) doesn't take advantage of the most important SIMD optimizations that compute multiple blocks in parallel. This blocks any project that wants to use the BLAKE3 XOF as a stream cipher ([1], [2]).
  • Low-level callers like Bao that need interior nodes of the tree also don't get those SIMD optimizations. They have to use a slow, minimalistic, unstable, doc-hidden module (also called guts).

The difficulty with adding those features is that they require changes to all of our optimized assembly and C intrinsics code. That's a couple dozen different files that are large, platform-specific, difficult to understand, and full of duplicated code. The higher-level Rust and C implementations of BLAKE3 both depend on these files and will need to coordinate changes.

At the same time, it won't be long before we add support for more platforms:

  • RISCV vector extensions
  • ARM SVE
  • WebAssembly SIMD

It's important to get this refactor done before new platforms make it even harder to do.

The private guts API

This is the API that each platform reimplements, so we want it to be as simple as possible apart from the high-performance work it needs to do. It's completely unsafe, and inputs and outputs are raw pointers that are allowed to alias (this matters for hash_parents, see below).

  • degree
  • compress
    • The single compression function, for short inputs and odd-length tails.
  • hash_chunks
  • hash_parents
  • xof
  • xof_xor
    • As xof but XOR'ing the result into the output buffer.
  • universal_hash
    • This is a new construction specifically to support BLAKE3-AEAD. Some implementations might just stub it out with portable code.

The public guts API

This is the API that this crate exposes to callers, i.e. to the main blake3 crate. It's a thin, portable layer on top of the private API above. The Rust version of this API is memory-safe.

  • degree
  • compress
  • hash_chunks
  • hash_parents
    • This handles most levels of the tree, where we keep hashing SIMD_DEGREE parents at a time.
  • reduce_parents
    • This uses the same hash_parents private API, but it handles the top levels of the tree where we reduce in-place to the root parent node.
  • xof
  • xof_xor
  • universal_hash