divinity76
58bea0bcbb
optimize neon loadu_128/storeu_128 ( #384 )
...
vld1q_u8 and vst1q_u8 has no alignment requirements.
This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input, from 13920 nanoseconds down to 13800 nanoseconds (approx)
2024-03-12 03:21:51 -04:00
Jack O'Connor
f7e1a7429f
retain the old NEON rotations in inline comments
2023-07-05 10:29:02 -07:00
sdlyyxy
7038dad280
NEON rot7/rot12 use shl+sri
2023-07-05 13:28:45 -04:00
sdlyyxy
a03b7af061
NEON: only use __builtin_shufflevector on clang
2023-07-05 13:28:45 -04:00
sdlyyxy
38a06e78d3
Improve NEON rot16/rot8
2023-07-05 13:28:45 -04:00
Jack O'Connor
080b333015
explicitly #error on big-endian ARM
2021-08-24 15:00:15 -04:00
Jack O'Connor
a7579d30ad
merge BLAKE3-c into this repo
...
This is commit 4476d9da0e370993823e7ad17592b84e905afd76 of
https://github.com/veorq/BLAKE3-c .
2020-01-09 09:48:52 -05:00