diff --git a/c/README.md b/c/README.md index af408a1..f597ead 100644 --- a/c/README.md +++ b/c/README.md @@ -1,29 +1,9 @@ -This is the C implementation of BLAKE3. The public API consists of one -struct and five functions in [`blake3.h`](blake3.h): +The official C implementation of BLAKE3. -- **`typedef struct {...} blake3_hasher`** An incremental BLAKE3 hashing - state, which can accept any number of updates. -- **`blake3_hasher_init(...)`** Initialize a `blake3_hasher` in the - default hashing mode. -- **`blake3_hasher_init_keyed(...)`** Initialize a `blake3_hasher` in - the keyed hashing mode, which accepts a 256-bit key. -- **`blake3_hasher_init_derive_key(...)`** Initialize a `blake3_hasher` - in the key derivation mode, which accepts a context string of any - length. In this mode, the key material is given as input after - initialization. The context string should be hardcoded, globally - unique, and application-specific. A good default format for such - strings is `"[application] [commit timestamp] [purpose]"`, e.g., - `"example.com 2019-12-25 16:18:03 session tokens v1"`. -- **`blake3_hasher_update(...)`** Add input to the hasher. This can be - called any number of times. -- **`blake3_hasher_finalize(...)`** Finalize the hasher and emit an - output of any length. This does not modify the hasher itself. It is - possible to finalize again after adding more input. +# Example -## Example - -Here's an example program that hashes bytes from standard input and -prints the result: +An example program that hashes bytes from standard input and prints the +result: ```c #include "blake3.h" @@ -63,15 +43,103 @@ gcc -O3 -o example example.c blake3.c blake3_dispatch.c blake3_portable.c \ blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S blake3_avx512_x86-64_unix.S ``` -## Building +# API -The Makefile included in this implementation is for testing. It's -expected that callers will have their own build systems. This section -describes the compilation steps that build systems (or folks compiling -by hand) should take. Note that these steps may change in future -versions. +## The Struct -### x86 +```c +typedef struct { + // private fields +} blake3_hasher; +``` + +An incremental BLAKE3 hashing state, which can accept any number of +updates. This implementation doesn't allocate any heap memory, but +`sizeof(blake3_hasher)` itself is relatively large, currently 1912 bytes +on x86-64. This size can be reduced by restricting the maximum input +length, as described in Section 5.4 of [the BLAKE3 +spec](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf), +but this implementation doesn't currently support that strategy. + +## Common API Functions + +```c +void blake3_hasher_init( + blake3_hasher *self); +``` + +Initialize a `blake3_hasher` in the default hashing mode. + +```c +void blake3_hasher_update( + blake3_hasher *self, + const void *input, + size_t input_len); +``` + +Add input to the hasher. This can be called any number of times. + +```c +void blake3_hasher_finalize( + const blake3_hasher *self, + uint8_t *out, + size_t out_len); +``` + +Finalize the hasher and emit an output of any length. This doesn't +modify the hasher itself, and it's possible to finalize again after +adding more input. The constant `BLAKE3_OUT_LEN` provides the default +output length, 32 bytes. + +## Less Common API Functions + +```c +void blake3_hasher_init_keyed( + blake3_hasher *self, + const uint8_t key[BLAKE3_KEY_LEN]); +``` + +Initialize a `blake3_hasher` in the keyed hashing mode. The key must be +exactly 32 bytes. + +```c +void blake3_hasher_init_derive_key( + blake3_hasher *self, + const char *context); +``` + +Initialize a `blake3_hasher` in the key derivation mode. Key material +should be given as input after initialization, using +`blake3_hasher_update`. `context` is a standard C string of any length, +and the terminating null byte is not included. The context string should +be hardcoded, globally unique, and application-specific. A good default +format for the context string is `"[application] [commit timestamp] +[purpose]"`, e.g., `"example.com 2019-12-25 16:18:03 session tokens +v1"`. + +```c +void blake3_hasher_finalize_seek( + const blake3_hasher *self, + uint64_t seek, + uint8_t *out, + size_t out_len); +``` + +The same as `blake3_hasher_finalize`, but with an additional `seek` +parameter for the starting byte position in the output stream. To +efficiently stream a large output without allocating memory, call this +function in a loop, incrementing `seek` by the output length each time. + +# Building + +This implementation is just C files. It doesn't include a public-facing +build system. (The `Makefile` in this directory is only for testing.) +Instead, the intention is that you can include these C and assemly files +in whatever build system you're already using. This section describes +the commands your build system should execute, or which you can execute +by hand. Note that these steps may change in future versions. + +## x86 Dynamic dispatch is enabled by default on x86. The implementation will query the CPU at runtime to detect SIMD support, and it will use the @@ -110,7 +178,9 @@ gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c \ Note above that building `blake3_avx512.c` requires both `-mavx512f` and `-mavx512vl` under GCC and Clang, as shown above. Under MSVC, the single -`/arch:AVX512` flag is sufficient. +`/arch:AVX512` flag is sufficient. The MSVC equivalent of `-mavx2` is +`/arch:AVX2`. MSVC enables SSE4.1 by defaut, and it doesn't have a +corresponding flag. If you want to omit SIMD code on x86, you need to explicitly disable each instruction set. Here's an example of building a shared library on @@ -121,7 +191,7 @@ gcc -shared -O3 -o libblake3.so -DBLAKE3_NO_SSE41 -DBLAKE3_NO_AVX2 -DBLAKE3_NO_A blake3.c blake3_dispatch.c blake3_portable.c ``` -### ARM NEON +## ARM NEON The NEON implementation is not enabled by default on ARM, since not all ARM targets support it. To enable it, set `BLAKE3_USE_NEON=1`. Here's an @@ -144,7 +214,7 @@ in call to always_inline ‘vaddq_u32’: target specific option mismatch ...then you may need to add something like `-mfpu=neon-vfpv4 -mfloat-abi=hard`. -### Other Platforms +## Other Platforms The portable implementation should work on most other architectures. For example: @@ -153,16 +223,12 @@ example: gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c ``` -## Differences from the Rust Implementation +# Differences from the Rust Implementation The single-threaded Rust and C implementations use the same algorithms, and their performance is the same if you use the assembly implementations or if you compile the intrinsics-based implementations with Clang. (Both Clang and rustc are LLVM-based.) -The C implementation does not currently support multi-threading. OpenMP +The C implementation doesn't currently support multi-threading. OpenMP support or similar might be added in the future. - -Both the C and Rust implementations support output of any length, but -only the Rust implementation provides an incremental (and seekable) -output reader. This might also be added in the future.