1 Implementation of BLAKE3, originating from https://github.com/BLAKE3-team/BLAKE3/tree/1.3.1/c
5 An example program that hashes bytes from standard input and prints the
11 #include "llvm/Support/BLAKE3.h"
19 // Initialize the hasher.
22 // Read input bytes from stdin.
25 ssize_t n = read(STDIN_FILENO, buf, sizeof(buf));
27 hasher.update(llvm::StringRef(buf, n));
31 fprintf(stderr, "read failed: %s\n", strerror(errno));
36 // Finalize the hash. Default output length is 32 bytes.
37 auto output = hasher.final();
39 // Print the hash as hexadecimal.
40 for (uint8_t byte : output) {
51 #include "llvm-c/blake3.h"
59 // Initialize the hasher.
60 llvm_blake3_hasher hasher;
61 llvm_blake3_hasher_init(&hasher);
63 // Read input bytes from stdin.
64 unsigned char buf[65536];
66 ssize_t n = read(STDIN_FILENO, buf, sizeof(buf));
68 llvm_blake3_hasher_update(&hasher, buf, n);
72 fprintf(stderr, "read failed: %s\n", strerror(errno));
77 // Finalize the hash. LLVM_BLAKE3_OUT_LEN is the default output length, 32 bytes.
78 uint8_t output[LLVM_BLAKE3_OUT_LEN];
79 llvm_blake3_hasher_finalize(&hasher, output, LLVM_BLAKE3_OUT_LEN);
81 // Print the hash as hexadecimal.
82 for (size_t i = 0; i < LLVM_BLAKE3_OUT_LEN; i++) {
83 printf("%02x", output[i]);
98 llvm_blake3_hasher Hasher;
104 } llvm_blake3_hasher;
107 An incremental BLAKE3 hashing state, which can accept any number of
108 updates. This implementation doesn't allocate any heap memory, but
109 `sizeof(llvm_blake3_hasher)` itself is relatively large, currently 1912 bytes
110 on x86-64. This size can be reduced by restricting the maximum input
111 length, as described in Section 5.4 of [the BLAKE3
112 spec](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf),
113 but this implementation doesn't currently support that strategy.
115 ## Common API Functions
123 void llvm_blake3_hasher_init(
124 llvm_blake3_hasher *self);
127 Initialize a `llvm_blake3_hasher` in the default hashing mode.
132 void BLAKE3::update(ArrayRef<uint8_t> Data);
134 void BLAKE3::update(StringRef Str);
137 void llvm_blake3_hasher_update(
138 llvm_blake3_hasher *self,
143 Add input to the hasher. This can be called any number of times.
148 template <size_t NumBytes = LLVM_BLAKE3_OUT_LEN>
149 using BLAKE3Result = std::array<uint8_t, NumBytes>;
151 template <size_t NumBytes = LLVM_BLAKE3_OUT_LEN>
152 void BLAKE3::final(BLAKE3Result<NumBytes> &Result);
154 template <size_t NumBytes = LLVM_BLAKE3_OUT_LEN>
155 BLAKE3Result<NumBytes> BLAKE3::final();
158 void llvm_blake3_hasher_finalize(
159 const llvm_blake3_hasher *self,
164 Finalize the hasher and return an output of any length, given in bytes.
165 This doesn't modify the hasher itself, and it's possible to finalize
166 again after adding more input. The constant `LLVM_BLAKE3_OUT_LEN` provides
167 the default output length, 32 bytes, which is recommended for most
170 Outputs shorter than the default length of 32 bytes (256 bits) provide
171 less security. An N-bit BLAKE3 output is intended to provide N bits of
172 first and second preimage resistance and N/2 bits of collision
173 resistance, for any N up to 256. Longer outputs don't provide any
176 Shorter BLAKE3 outputs are prefixes of longer ones. Explicitly
177 requesting a short output is equivalent to truncating the default-length
178 output. (Note that this is different between BLAKE2 and BLAKE3.)
180 ## Less Common API Functions
183 void llvm_blake3_hasher_init_keyed(
184 llvm_blake3_hasher *self,
185 const uint8_t key[LLVM_BLAKE3_KEY_LEN]);
188 Initialize a `llvm_blake3_hasher` in the keyed hashing mode. The key must be
194 void llvm_blake3_hasher_init_derive_key(
195 llvm_blake3_hasher *self,
196 const char *context);
199 Initialize a `llvm_blake3_hasher` in the key derivation mode. The context
200 string is given as an initialization parameter, and afterwards input key
201 material should be given with `llvm_blake3_hasher_update`. The context string
202 is a null-terminated C string which should be **hardcoded, globally
203 unique, and application-specific**. The context string should not
204 include any dynamic input like salts, nonces, or identifiers read from a
205 database at runtime. A good default format for the context string is
206 `"[application] [commit timestamp] [purpose]"`, e.g., `"example.com
207 2019-12-25 16:18:03 session tokens v1"`.
209 This function is intended for application code written in C. For
210 language bindings, see `llvm_blake3_hasher_init_derive_key_raw` below.
215 void llvm_blake3_hasher_init_derive_key_raw(
216 llvm_blake3_hasher *self,
221 As `llvm_blake3_hasher_init_derive_key` above, except that the context string
222 is given as a pointer to an array of arbitrary bytes with a provided
223 length. This is intended for writing language bindings, where C string
224 conversion would add unnecessary overhead and new error cases. Unicode
225 strings should be encoded as UTF-8.
227 Application code in C should prefer `llvm_blake3_hasher_init_derive_key`,
228 which takes the context as a C string. If you need to use arbitrary
229 bytes as a context string in application code, consider whether you're
230 violating the requirement that context strings should be hardcoded.
235 void llvm_blake3_hasher_finalize_seek(
236 const llvm_blake3_hasher *self,
242 The same as `llvm_blake3_hasher_finalize`, but with an additional `seek`
243 parameter for the starting byte position in the output stream. To
244 efficiently stream a large output without allocating memory, call this
245 function in a loop, incrementing `seek` by the output length each time.
250 void llvm_blake3_hasher_reset(
251 llvm_blake3_hasher *self);
254 Reset the hasher to its initial state, prior to any calls to
255 `llvm_blake3_hasher_update`. Currently this is no different from calling
256 `llvm_blake3_hasher_init` or similar again. However, if this implementation gains
257 multithreading support in the future, and if `llvm_blake3_hasher` holds (optional)
258 threading resources, this function will reuse those resources.
263 This implementation is just C and assembly files.
267 Dynamic dispatch is enabled by default on x86. The implementation will
268 query the CPU at runtime to detect SIMD support, and it will use the
269 widest instruction set available. By default, `blake3_dispatch.c`
270 expects to be linked with code for five different instruction sets:
271 portable C, SSE2, SSE4.1, AVX2, and AVX-512.
273 For each of the x86 SIMD instruction sets, four versions are available:
274 three flavors of assembly (Unix, Windows MSVC, and Windows GNU) and one
275 version using C intrinsics. The assembly versions are generally
276 preferred. They perform better, they perform more consistently across
277 different compilers, and they build more quickly. On the other hand, the
278 assembly versions are x86\_64-only, and you need to select the right
279 flavor for your target platform.
283 The NEON implementation is enabled by default on AArch64, but not on
284 other ARM targets, since not all of them support it. To enable it, set
287 To explicitiy disable using NEON instructions on AArch64, set
292 The portable implementation should work on most other architectures.
296 The implementation doesn't currently support multithreading.