This gives the assembly files the same prefix as the intrinsics files which
simplifies building when the build system should pick between the assembly and
the intrinsics files.
Currently this requires setting the BLAKE3_USE_NEON preprocessor flag.
In the future we may enable this automatically on AArch32/64 or include
some kind of dynamic feature detection. (Though ARM makes this harder
than x86.)
As part of this, get rid of the IS_ARM flag. It wasn't being set
properly when I tried it on a Raspberry Pi.
Closes #30.