1
0
Fork 0
mirror of https://github.com/BLAKE3-team/BLAKE3 synced 2024-04-20 09:13:56 +02:00
Commit Graph

184 Commits

Author SHA1 Message Date
Javier Blazquez 0816badf3a fix Windows ARM64 build and detect ARM64EC as ARM64 2024-04-07 11:48:02 -04:00
Jack O'Connor 54930c9522 version 1.5.1
Changes since 1.5.0:
- The Rust crate is now compatible with Miri.
- ~1% performance improvement on Arm NEON contributed by @divinity76 (#384).
- Various fixes and improvements in the CMake build.
- The MSRV of b3sum is now 1.74.1. (The MSRV of the library crate is
  unchanged, 1.66.1.)
2024-03-12 00:34:53 -07:00
divinity76 58bea0bcbb
optimize neon loadu_128/storeu_128 (#384)
vld1q_u8 and vst1q_u8 has no alignment requirements.

This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input, from 13920 nanoseconds down to 13800 nanoseconds (approx)
2024-03-12 03:21:51 -04:00
Jack O'Connor 8fc36186b8 comment cleanup 2024-02-04 13:32:30 -08:00
divinity76 2918c51bc6 silenc gcc Werror=logical-op
```
/home/travis/build/php/php-src/ext/hash/blake3/upstream_blake3/c/blake3.c: In function ‘compress_subtree_to_parent_node’:
/home/travis/build/php/php-src/ext/hash/blake3/upstream_blake3/c/blake3.c:354:22: error: logical ‘and’ of mutually exclusive tests is always false [-Werror=logical-op]
  354 |   while (num_cvs > 2 && num_cvs <= MAX_SIMD_DEGREE_OR_2) {
      |                      ^~
cc1: all warnings being treated as errors
make: *** [Makefile:1910: ext/hash/blake3/upstream_blake3/c/blake3.lo] Error 1
```

Fixes https://github.com/BLAKE3-team/BLAKE3/issues/379.
Closes https://github.com/BLAKE3-team/BLAKE3/pull/380.
2024-02-04 13:31:55 -08:00
Henrik S. Gaßmann 7ce2aa41e9 build(CMake): Require C99 mode
Specify language requirement as a [compile-feature] and force compiler
extensions off ensuring portability problems are detected early on.
Note that we do not use the `C_STANDARD` property, because it doesn't
propagate to dependent targets and would prohibit users from compiling
their code base with consistent flags / language configuations if they
were to target a newer C standard. Similarly we do not configure
`C_STANDARD_REQUIRED` as [compile-features] do not interact with
it--they are enforced regardless.

[compile-feature]: https://cmake.org/cmake/help/latest/manual/cmake-compile-features.7.html#compile-feature-requirements
2023-12-02 11:11:10 -08:00
Viacheslav H 1930721c50
Fix CMake target include directories if library is used with add_subdirectory or FetchContent 2023-11-05 12:16:48 -05:00
Rui Ueyama e1f851d461 Fix Windows build with clang-cl
clang-cl is LLVM's MSVC-compatible compiler frontend for Windows ABI.
If clang-cl is in use, `CMAKE_C_COMPILER_ID` is `Clang` even though
it doesn't take Unix-like command line options but MSVC-like options.

`if(MSVC)` is the correct predicate to check if we should pass MSVC-ish
command line options.
2023-11-05 09:08:13 -08:00
Henrik Gaßmann 3e14f865d3 style: Remove trailing whitespace in CMakeLists.txt 2023-10-31 11:51:26 +01:00
Henrik Gaßmann bfd568897a build(CMake): Provide NEON cflags for ARMv8 32bit
ARMv8 CPUs are guaranteed to support NEON instructions. However, for
32bit ARMv8 triplets GCC needs to explicitly be configured to enable
NEON intrinsics.
2023-10-31 11:45:26 +01:00
Henrik Gaßmann dd30dcb002 build(CMake): Apply PP definitions to all sources 2023-10-02 11:12:50 -07:00
Jack O'Connor 5aa53f07f7 version 1.5.0
Changes since 1.4.1:
- The Rust crate's Hasher type has gained new helper methods for common
  forms of IO: update_reader, update_mmap, and update_mmap_rayon. The
  latter matches the default behavior of b3sum. The mmap methods are
  gated by the new "mmap" Cargo feature.
- Most of the Rust crate's public types now implement the Zeroize trait.
  This is gated by the new "zeroize" Cargo feature.
- The Rust crate's Hash types now implements the serde Serialize and
  Deserialize traits. This is gated by the new "serde" Cargo feature.
- The C library now uses atomics to cache detected CPU features under
  most compilers other than MSVC. Previously this was a non-atomic
  write, which was probably "benign" but made TSan unhappy.
- NEON support is now disabled by default on big-endian AArch64.
  Previously this was a build error if the caller didn't explicitly
  disable it.
2023-09-20 20:12:18 -07:00
Havard Eidnes 8bfe93fbf9 c/blake3_impl.h: don't try to do NEON on big-endian aarch64.
...because this would otherwise hit
  #error "This implementation only supports little-endian ARM."
in c/blake3_neon.c.
2023-09-19 16:57:11 -07:00
Jack O'Connor 02dec6e9a6 fix a build break in the blake3_c tests 2023-09-10 14:04:57 -07:00
Jack O'Connor d6265dafc9 update dev-dependencies 2023-09-10 13:40:12 -07:00
Javier Blazquez 12823b8760 blake3_dispatch: Fix race condition initializing g_cpu_features.
If multiple threads try to compute a hash simultaneously before the library has been used for the first time,
the logic in get_cpu_features that detects CPU features will write to g_cpu_features without synchronization,
which is a race condition and flagged by ThreadSanitizer.

This change marks g_cpu_features as an atomic variable to address the race condition.
2023-07-21 19:18:40 -07:00
Jack O'Connor 760ed6a8bf version 1.4.1
Changes since 1.4.0:
- Improved performance in the ARM NEON implementation for both C and
  Rust callers. This affects AArch64 targets by default and ARMv7
  targets that explicitly enable (and support) NEON. The size of the
  improvement depends on the microarchitecture, but I've benchmarked
  ~1.3x on a Cortex-A53 and ~1.2x on an Apple M1. Contributed by
  @sdlyyxy in #319.
- The MSRV is now 1.66.1 for both the `blake3` crate and `b3sum`.
2023-07-06 14:30:32 -07:00
Jack O'Connor f7e1a7429f retain the old NEON rotations in inline comments 2023-07-05 10:29:02 -07:00
sdlyyxy 7038dad280 NEON rot7/rot12 use shl+sri 2023-07-05 13:28:45 -04:00
sdlyyxy a03b7af061 NEON: only use __builtin_shufflevector on clang 2023-07-05 13:28:45 -04:00
sdlyyxy 38a06e78d3 Improve NEON rot16/rot8 2023-07-05 13:28:45 -04:00
1f604 e47e570691 Fix typo exendable -> extendable 2023-06-27 11:31:51 -04:00
Henrik S. Gaßmann 3f396d2239 build(CMake): Rework NEON detection
Given the myriad of `-mfpu` options for ARM [1], the inability to
portably query for CPU support, and the lack of standardized ISA names
we have no other choice, but to opt out of automatically supplying NEON
compile flags. Instead we simply add the NEON optimized source file if
we detect an ISA with guaranteed NEON support (>= ARMv8) or the user
explicitly requests it (in which case he is expected to provide the
compile flags with `CMAKE_C_FLAGS` or `BLAKE3_CFLAGS_NEON` either
through a toolchain file or commandline parameters).

[1]: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
2023-06-17 22:57:45 -04:00
Jack O'Connor 74220e2ca1 correct the VERSION in CMakeLists.txt 2023-06-17 13:22:26 -07:00
Henrik S. Gaßmann 072bef5bf0 build(CMake): Fix pkg-config directory specification
- Properly add the install prefix to the libdir and includedir.
- Define the prefix once.
- Quote paths which may contain whitespace.
2023-06-17 14:57:25 -04:00
Jack O'Connor 65733a753b version 1.4.0
Changes since 1.3.3:
- The C implementation provides a `CMakeLists.txt` for callers who build
  with CMake. The CMake build is not yet stable, and callers should
  expect breaking changes in patch version updates. The "by hand" build
  will always continue to be supported and documented.
- `b3sum` supports the `--seek` flag, to set the starting position in
  the output stream.
- `b3sum --check` prints a summary of errors to stderr.
- `Hash::as_bytes` is const.
- `Hash` supports `from_bytes`, which is const.
2023-06-08 13:06:32 -07:00
Henrik S. Gaßmann 76f9339312 build(cmake): Print the active SIMD configuration 2023-05-24 13:31:00 -07:00
Henrik S. Gaßmann 0e872a02ea build(cmake): Properly configure dispatcher for no SIMD
If no SIMD support could be configured we need to inform
`blake3_dispatch.c` about it.
2023-05-24 13:31:00 -07:00
Henrik S. Gaßmann 962d5f757e build(cmake): Correctly detect x86 and arm64 Windows
The ISA names communicated by `CMAKE_SYSTEM_PROCESSOR` aren't as much
standardized as one would wish they were. Factor the different names
into lists allowing for simpler checks and future updates.

Add hidden options for enabling SIMD support in case ISA detection
fails. These should only be used to temporarily workarounds until the
ISA name lists has been updated/fixed.
2023-05-24 13:31:00 -07:00
Jack O'Connor ef5679ef7b Update c/CMakeLists.txt
Co-authored-by: Henrik Gaßmann <BurningEnlightenment@users.noreply.github.com>
2023-05-23 14:48:45 -07:00
Jack O'Connor afebadf4a0 Update c/CMakeLists.txt
Co-authored-by: Henrik Gaßmann <BurningEnlightenment@users.noreply.github.com>
2023-05-23 14:48:45 -07:00
Henrik S. Gaßmann 1a9dd71681 Explicitly specify C symbol visibility
In order for blake3 to be usable as a shared library on Windows it is
required to annotate public symbols. Use this as an opportunity to prune
the symbol table for other OSes, too.
2023-05-23 14:48:45 -07:00
Henrik S. Gaßmann 4bb0466579 Refactor CMake buildsystem to be portable and modern
Aggreggate source files directly in the target instead of a proxy
variable.

Install CMake package config files in order to allow the project to be
found via `find_package()` by dependents.

Replace hard coded SIMD compiler flags with configurable options. Retain
the current GCC/Clang flags as defaults for these compilers. Add default
SIMD compiler flags for MSVC.

Remove hard coded compiler flags (including -fPIC). These are not
portable and should be set by the toolchain file or on the CLI.

- Guard ASM sources with triplet compatibility checks.
- Remove the `BLAKE3_STATIC` option in favor of [`BUILD_SHARED_LIBS`].

[`BUILD_SHARED_LIBS`]: https://cmake.org/cmake/help/v3.9/variable/BUILD_SHARED_LIBS.html
2023-05-23 14:48:45 -07:00
Joel Rosdahl 2dd4e57f68 Fix typos 2023-05-23 14:39:27 -07:00
SteveGremory b0a3863c06 Minor changes to CMake, added SSE support. Added options to only make either static or shared libs. 2023-05-01 00:59:56 -07:00
SteveGremory 3d8a673f59 Fixed on macOS 2023-05-01 00:59:56 -07:00
SteveGremory b494d215e5 Hotfix CMakeLists.txt 2023-05-01 00:59:56 -07:00
SteveGremory 1569e34555 Added CMake support, CMakeLists.txt taken from issue 102 2023-05-01 00:59:56 -07:00
Samuel Neves 9ac0a9b896 correct SSSE3 detection; fixes #300
SSSE3 is indicated by bit 9 of ECX, not bit 0, which indicates the
presence of SSE3.

There are very few CPUs in use affected by this bug; SSE3 was part of
the Prescott new instructions, introduced in the later Pentium 4 chips,
whereas SSSE3 was introduced in Intel's Core 2 and AMD's Bulldozer. This
leaves a few Pentium 4 and Athlon 64 models that will potentially run an
illegal pshufb or pblendw.
2023-04-21 21:28:01 +01:00
Jack O'Connor a9750c7fec upgrade all Cargo.toml files to edition=2021
The MSRV is already 1.60, so this doesn't affect much. The only impact
to other code is that we no longer need to explicitly import TryInto.
2023-03-25 16:36:37 -07:00
namazso c303437aab Correct section names on Windows GNU assembly 2023-01-23 11:19:19 -08:00
Alberto González Palomo 606a5825d9 Make sign conversion explicit. Fix #287.
Implicit sign conversions cause warnings when using -Wsign-conversion
but that is easy to avoid by making the conversions explicit.
2023-01-19 13:13:32 -08:00
Jack O'Connor 67e4d04a3c version 1.3.3
Changes since 1.3.2:
- Fix incorrect output from AVX-512 intrinsics under GCC 5.4 and 6.1 in
  debug mode. This bug was found in unit tests and probably doesn't
  affect the public API in practice. See
  https://github.com/BLAKE3-team/BLAKE3/issues/271.
2022-11-26 00:31:40 -05:00
Jack O'Connor 342f9f8067 fix incorrect output from AVX-512 intrinsics in debug mode under GCC 5.4 and 6.1
Fixes https://github.com/BLAKE3-team/BLAKE3/issues/271.

The `_mm512_cmp_epu32_mask` intrinsic is broken under GCC 5.4 and 6.1.
This led to incorrect output in the AVX-512 implementation when building
with intrinsics instead of assembly. This fix is a simplified version of
Samuel's proposed fix here:
f10816e857 (commitcomment-90742995)
2022-11-23 14:14:19 -08:00
Jack O'Connor 5dad698d3f test multiple initial counter values for hash_many
I'm adding the i32::MAX test case here because I personally screwed it
up while I was working on
https://github.com/BLAKE3-team/BLAKE3/issues/271. The correct
implementation of the carry bit is the ANDNOT of old high bit (1) and
the new high bit (0). Using XOR instead of ANDNOT gives the correct
answer in the overflow case, but it also reports an incorrect "extra"
overflow when the high bit goes from 0 to 1.
2022-11-22 23:31:29 -08:00
Jack O'Connor 537e96747a version 1.3.2:
Changes since 1.3.1:
- Dependency updates only. This includes updating Clap to v4, which
  changes the format of the `b3sum --help` output. The new MSRV is
  1.59.0 for `blake3` and 1.60.0 for `b3sum`. Note that this project
  doesn't have any particular MSRV policy, and we don't consider MSRV
  bumps to be breaking changes.
2022-11-20 15:29:45 -08:00
wargio cf5d59cd43 Support portable build without intrinsics 2022-10-03 11:24:18 +02:00
Jack O'Connor e733e5ac98 fix another instance of the same typo 2022-07-28 14:15:13 -07:00
Jack O'Connor 09df11731e replace a copy-pasted Rust API reference in the C docs 2022-07-22 10:48:33 -07:00
Fangrui Song 9114ff8ed1 add prototypes to fix -Wstrict-prototypes warnings 2022-04-09 11:00:17 -07:00