summaryrefslogtreecommitdiff
path: root/rust/guts/readme.md
blob: 4957816df7400297da58a2ddb448f9e582064f9d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# The BLAKE3 Guts API

## Introduction

This [`blake3_guts`](https://crates.io/crates/blake3_guts) sub-crate contains
low-level, high-performance, platform-specific implementations of the BLAKE3
compression function. This API is complicated and unsafe, and this crate will
never have a stable release. Most callers should instead use the
[`blake3`](https://crates.io/crates/blake3) crate, which will eventually depend
on this one internally.

The code you see here (as of January 2024) is an early stage of a large planned
refactor. The motivation for this refactor is a couple of missing features in
both the Rust and C implementations:

- The output side
  ([`OutputReader`](https://docs.rs/blake3/latest/blake3/struct.OutputReader.html)
  in Rust) doesn't take advantage of the most important SIMD optimizations that
  compute multiple blocks in parallel. This blocks any project that wants to
  use the BLAKE3 XOF as a stream cipher
  ([[1]](https://github.com/oconnor663/bessie),
  [[2]](https://github.com/oconnor663/blake3_aead)).
- Low-level callers like [Bao](https://github.com/oconnor663/bao) that need
  interior nodes of the tree also don't get those SIMD optimizations. They have
  to use a slow, minimalistic, unstable, doc-hidden module [(also called
  `guts`)](https://github.com/BLAKE3-team/BLAKE3/blob/master/src/guts.rs).

The difficulty with adding those features is that they require changes to all
of our optimized assembly and C intrinsics code. That's a couple dozen
different files that are large, platform-specific, difficult to understand, and
full of duplicated code. The higher-level Rust and C implementations of BLAKE3
both depend on these files and will need to coordinate changes.

At the same time, it won't be long before we add support for more platforms:

- RISCV vector extensions
- ARM SVE
- WebAssembly SIMD

It's important to get this refactor done before new platforms make it even
harder to do.

## The private guts API

This is the API that each platform reimplements, so we want it to be as simple
as possible apart from the high-performance work it needs to do. It's
completely `unsafe`, and inputs and outputs are raw pointers that are allowed
to alias (this matters for `hash_parents`, see below).

- `degree`
- `compress`
    - The single compression function, for short inputs and odd-length tails.
- `hash_chunks`
- `hash_parents`
- `xof`
- `xof_xor`
    - As `xof` but XOR'ing the result into the output buffer.
- `universal_hash`
    - This is a new construction specifically to support
      [BLAKE3-AEAD](https://github.com/oconnor663/blake3_aead). Some
      implementations might just stub it out with portable code.

## The public guts API

This is the API that this crate exposes to callers, i.e. to the main `blake3`
crate. It's a thin, portable layer on top of the private API above. The Rust
version of this API is memory-safe.

- `degree`
- `compress`
- `hash_chunks`
- `hash_parents`
    - This handles most levels of the tree, where we keep hashing SIMD_DEGREE
      parents at a time.
- `reduce_parents`
    - This uses the same `hash_parents` private API, but it handles the top
      levels of the tree where we reduce in-place to the root parent node.
- `xof`
- `xof_xor`
- `universal_hash`