7
|
1
|
* File Systems |
9
|
2
|
:PROPERTIES: |
|
3
|
:ID: 29f7085e-53a3-4d70-90a7-e3437ee99775 |
|
4
|
:END: |
7
|
5
|
** BTRFS |
9
|
6
|
:PROPERTIES: |
|
7
|
:ID: 3e7bb82e-7dae-476a-8ed0-18c361c9bd1b |
|
8
|
:END: |
7
|
9
|
#+begin_quote |
|
10
|
BTRFS is a Linux filesystem based on copy-on-write, allowing for |
|
11
|
efficient snapshots and clones. |
|
12
|
|
|
13
|
It uses B-trees as its main on-disk data structure. The design goal is |
|
14
|
to work well for many use cases and workloads. To this end, much |
|
15
|
effort has been directed to maintaining even performance as the |
|
16
|
filesystem ages, rather than trying to support a particular narrow |
|
17
|
benchmark use-case. |
|
18
|
|
|
19
|
Linux filesystems are installed on smartphones as well as enterprise |
|
20
|
servers. This entails challenges on many different fronts. |
|
21
|
|
|
22
|
- Scalability :: The filesystem must scale in many dimensions: disk |
|
23
|
space, memory, and CPUs. |
|
24
|
|
|
25
|
- Data integrity :: Losing data is not an option, and much effort is |
|
26
|
expended to safeguard the content. This includes checksums, metadata |
|
27
|
duplication, and RAID support built into the filesystem. |
|
28
|
|
|
29
|
- Disk diversity :: The system should work well with SSDs and hard |
|
30
|
disks. It is also expected to be able to use an array of different |
|
31
|
sized disks, which poses challenges to the RAID and striping |
|
32
|
mechanisms. |
|
33
|
#+end_quote |
|
34
|
-- [cite/t/f:@btrfs] |
|
35
|
*** [2023-08-08 Tue] btrfs performance speculation :: |
9
|
36
|
:PROPERTIES: |
|
37
|
:ID: 2b662144-97b2-4736-a8fc-bc8f861b9829 |
|
38
|
:END: |
7
|
39
|
- [[https://www.percona.com/blog/taking-a-look-at-btrfs-for-mysql/]] |
|
40
|
- zfs outperforms immensely, but potential misconfiguration on btrfs side (virt+cow |
|
41
|
still enabled?) |
|
42
|
- https://www.ctrl.blog/entry/btrfs-vs-ext4-performance.html |
|
43
|
- see the follow up comment on this post |
|
44
|
- https://www.reddit.com/r/archlinux/comments/o2gc42/is_the_performance_hit_of_btrfs_serious_is_it/ |
|
45
|
#+begin_quote |
|
46
|
I’m the author of OP’s first link. I use BtrFS today. I often shift lots of |
|
47
|
de-duplicatable data around, and benefit greatly from file cloning. The data is actually |
|
48
|
the same data that caused the slow performance in the article. BtrFS and file cloning |
|
49
|
now performs this task quicker than a traditional file system. (Hm. It’s time for a |
|
50
|
follow-up article.) |
|
51
|
|
|
52
|
In a laptop with one drive: it doesn’t matter too much unless you do work that benefit |
|
53
|
from file cloning or snapshots. This will likely require you to adjust your tooling and |
|
54
|
workflow. I’ve had to rewrite the software I use every day to make it take advantage of |
|
55
|
the capabilities of a more modern file system. You won’t benefit much from the data |
|
56
|
recovery and redundancy features unless you’ve got two storage drives in your laptop and |
|
57
|
can setup redundant data copies. |
|
58
|
|
|
59
|
on similar hardware to mine? |
|
60
|
|
|
61
|
It’s not a question about your hardware as much as how you use it. The bad performance I |
|
62
|
documented was related to lots and lots of simultaneous random reads and writes. This |
|
63
|
might not be representative of how you use your computer. |
|
64
|
#+end_quote |
|
65
|
- https://dl.acm.org/doi/fullHtml/10.1145/3386362 |
|
66
|
- this is about distributed file systems (in this case Ceph) - they argue against |
|
67
|
basing DFS on ondisk-format filesystems (XFS ext4) - developed BlueStore as |
|
68
|
backend, which runs directly on raw storage hardware. |
|
69
|
- this is a good approach, but expensive (2 years in development) and risky |
|
70
|
- better approach is to take advantage of a powerful enough existing ondisk-FS |
|
71
|
format and pair it with supporting modules which abstract away the 'distributed' |
|
72
|
mechanics. |
|
73
|
- the strategy presented here is critical for enterprise-grade hardware where the |
|
74
|
ondisk filesystem becomes the bottleneck that you're looking to optimize |
|
75
|
- https://lore.kernel.org/lkml/cover.1676908729.git.dsterba@suse.com/ |
|
76
|
- linux 6.3 patch by David Sterba [2023-02-20 Mon] |
|
77
|
- btrfs continues to show improvements in the linux kernel, ironing out the kinks |
|
78
|
- makes it hard to compare benchmarks tho :/ |
|
79
|
*** MacOS support |
9
|
80
|
:PROPERTIES: |
|
81
|
:ID: 9ddb1caf-014a-4d0f-972e-82028e8be286 |
|
82
|
:END: |
7
|
83
|
- see this WIP k-ext for macos: [[https://github.com/relalis/macos-btrfs][macos-btrfs]] |
|
84
|
- maybe we can help out with the VFS/mount support |
|
85
|
*** on-disk format |
9
|
86
|
:PROPERTIES: |
|
87
|
:ID: 7535f844-330b-4f9c-b2a1-4578d64acbf7 |
|
88
|
:END: |
7
|
89
|
- [[https://btrfs.readthedocs.io/en/latest/dev/On-disk-format.html][on-disk-format]] |
|
90
|
- 'btrfs consists entirely of several trees. the trees use copy-on-write.' |
|
91
|
- trees are stored in nodes which belong to a level in the b-tree structure. |
|
92
|
- internal nodes (inodes) contain refs to other inodes on the /next/ level OR |
|
93
|
- to leaf nodes then the level reaches 0. |
|
94
|
- leaf nodes contain various types depending on the tree. |
|
95
|
- basic structures |
|
96
|
- 0:8 uint = objectid, each tree has its own set of object IDs |
|
97
|
- 8:1 uint = item type |
|
98
|
- 9:8 uint = offset, depends on type. |
|
99
|
- little-endian |
|
100
|
- fields are unsigned |
|
101
|
- *superblock* |
|
102
|
- primary superblock is located at 0x10000 (64KiB) |
|
103
|
- Mirror copies of the superblock are located at physical addresses 0x4000000 (64 |
|
104
|
MiB) and 0x4000000000 (256GiB), if valid. copies are updated simultaneously. |
|
105
|
- during mount only the first super block at 0x10000 is read, error causes mount to |
|
106
|
fail. |
|
107
|
- BTRFS onls recognizes disks with a valid 0x10000 superblock. |
|
108
|
- *header* |
|
109
|
- stored at the start of every inode |
|
110
|
- data following it depends on whether it is an internal or leaf node. |
|
111
|
- *inode* |
|
112
|
- node header followed by a number of key pointers |
|
113
|
- 0:11 key |
|
114
|
- 11:8 uint = block number |
|
115
|
- 19:8 uint = generation |
|
116
|
- *lnode* |
|
117
|
- leaf nodes contain header followed by key pointers |
|
118
|
- 0:11 key |
|
119
|
- 11:4 uint = data offset relative to end of header(65) |
|
120
|
- 15:4 uint = data size |
|
121
|
- objects |
|
122
|
- ROOT_TREE |
|
123
|
- holds ROOT_ITEMs, ROOT_REFs, and ROOT_BACKREFs for every tree other than itself. |
|
124
|
- used to find the other trees and to determine the subvol structure. |
|
125
|
- holds items for the 'root tree directory'. laddr is store in the superblock |
|
126
|
- objectIDs |
|
127
|
- free ids: BTRFS_FIRST_FREE_OBJECTID=256ULL:BTRFS_LAST_FREE_OBJECTID=-256ULL |
|
128
|
- otherwise used for internal use |
|
129
|
*** send-stream format |
9
|
130
|
:PROPERTIES: |
|
131
|
:ID: 1d1a6211-c91f-48ae-8113-0ddada286cee |
|
132
|
:END: |
7
|
133
|
- [[https://btrfs.readthedocs.io/en/latest/dev/dev-send-stream.html][send stream format]] |
|
134
|
- Send stream format represents a linear sequence of commands describing actions to be |
|
135
|
performed on the target filesystem (receive side), created on the source filesystem |
|
136
|
(send side). |
|
137
|
- The stream is currently used in two ways: to generate a stream representing a |
|
138
|
standalone subvolume (full mode) or a difference between two snapshots of the same |
|
139
|
subvolume (incremental mode). |
|
140
|
- The stream can be generated using a set of other subvolumes to look for extent |
|
141
|
references that could lead to a more efficient stream by transferring only the |
|
142
|
references and not full data. |
|
143
|
- The stream format is abstracted from on-disk structures (though it may share some |
|
144
|
BTRFS specifics), the stream instructions could be generated by other means than the |
|
145
|
send ioctl. |
|
146
|
- it's a checksum+TLV |
|
147
|
- header: u32len,u16cmd,u32crc32c |
|
148
|
- data: type,length,raw data |
|
149
|
- the v2 protocol supports the encoded commands |
|
150
|
- the commands are kinda clunky - need to MKFIL/MKDIR then RENAM to create |
|
151
|
*** [2023-08-09 Wed] ioctls |
9
|
152
|
:PROPERTIES: |
|
153
|
:ID: be04dc90-86a6-46c5-9dcf-25519ebed34d |
|
154
|
:END: |
7
|
155
|
- magic#: 0x94 |
|
156
|
- https://docs.kernel.org/userspace-api/ioctl/ioctl-number.html |
|
157
|
- Btrfs filesystem some lifted to vfs/generic |
|
158
|
- fs/btrfs/ioctl.h and linux/fs.h |
|
159
|
** ZFS |
9
|
160
|
:PROPERTIES: |
|
161
|
:ID: 219cb1b5-7dff-4800-8ad0-2a19309a9e9f |
|
162
|
:END: |
7
|
163
|
-- [cite/t/f:@zfs] |
|
164
|
|
|
165
|
- core component of TrueNAS software |
|
166
|
** TMPFS |
9
|
167
|
:PROPERTIES: |
|
168
|
:ID: f2167ca1-f751-4ee3-a398-4cf7fff6b57c |
|
169
|
:END: |
7
|
170
|
-- [cite/t/f:@tmpfs] |
|
171
|
- in-mem FS |
|
172
|
** EXT4 |
9
|
173
|
:PROPERTIES: |
|
174
|
:ID: 698fb02f-dc73-40be-8bac-0af3a03c39c6 |
|
175
|
:END: |
7
|
176
|
-- [cite/t/f:@ext4] |
|
177
|
** XFS |
9
|
178
|
:PROPERTIES: |
|
179
|
:ID: 8c6cf1e4-1555-4270-a101-40b6fbb0a1f9 |
|
180
|
:END: |
7
|
181
|
-- [cite/t/f:@xfs] |
|
182
|
-- [cite/t/f:@xfs-scalability] |
|
183
|
* Storage Mediums |
9
|
184
|
:PROPERTIES: |
|
185
|
:ID: e3701458-b333-44e3-b6f2-12861d6287ed |
|
186
|
:END: |
7
|
187
|
** HDD |
9
|
188
|
:PROPERTIES: |
|
189
|
:ID: dfe51b59-5f9d-4e7d-86f1-27e51453ae1f |
|
190
|
:END: |
7
|
191
|
-- [cite/t/f:@hd-failure-ml] |
|
192
|
** SSD |
9
|
193
|
:PROPERTIES: |
|
194
|
:ID: 9c9a5470-d0c2-49f1-9dc9-d0d62c841a19 |
|
195
|
:END: |
7
|
196
|
-- [cite/t/f:@smart-ssd-qp] |
|
197
|
-- [cite/t/f:@ssd-perf-opt] |
|
198
|
|
|
199
|
** Flash |
9
|
200
|
:PROPERTIES: |
|
201
|
:ID: 1414bda3-7fae-4fe0-ae65-8a5ed05ad822 |
|
202
|
:END: |
7
|
203
|
-- [cite/t/f:@flash-openssd-systems] |
|
204
|
** NVMe |
9
|
205
|
:PROPERTIES: |
|
206
|
:ID: 95e44402-9235-4b4b-a772-b91d78e38a6b |
|
207
|
:END: |
7
|
208
|
-- [cite/t/f:@nvme-ssd-ux] |
|
209
|
-- [[https://nvmexpress.org/specifications/][specifications]] |
|
210
|
*** ZNS |
9
|
211
|
:PROPERTIES: |
|
212
|
:ID: 5639429c-1c9d-4cf0-b69c-cc45528cac50 |
|
213
|
:END: |
7
|
214
|
-- [cite/t/f:@zns-usenix] |
|
215
|
#+begin_quote |
|
216
|
Zoned Storage is an open source, standards-based initiative to enable data centers to |
|
217
|
scale efficiently for the zettabyte storage capacity era. There are two technologies |
|
218
|
behind Zoned Storage, Shingled Magnetic Recording (SMR) in ATA/SCSI HDDs and Zoned |
|
219
|
Namespaces (ZNS) in NVMe SSDs. |
|
220
|
#+end_quote |
|
221
|
-- [[https://zonedstorage.io/][zonedstorage.io]] |
|
222
|
-- $465 8tb 2.5"? [[https://www.serversupply.com/SSD/PCI-E/7.68TB/WESTERN%20DIGITAL/WUS4BB076D7P3E3_332270.htm][retail]] |
|
223
|
** eMMC |
9
|
224
|
:PROPERTIES: |
|
225
|
:ID: b8539369-0e0f-4f23-be8f-cd38be031bac |
|
226
|
:END: |
7
|
227
|
-- [cite/t/f:@emmc-mobile-io] |
|
228
|
* Linux |
9
|
229
|
:PROPERTIES: |
|
230
|
:ID: b244015e-f3e2-4837-8186-a2f5edef1f14 |
|
231
|
:END: |
7
|
232
|
** syscalls |
9
|
233
|
:PROPERTIES: |
|
234
|
:ID: 935f67e0-eef6-4913-9dcc-8530129be37c |
|
235
|
:END: |
7
|
236
|
*** ioctl |
9
|
237
|
:PROPERTIES: |
|
238
|
:ID: 134d256a-f7b3-4603-846c-b6c9bad2d708 |
|
239
|
:END: |
7
|
240
|
- [[https://elixir.bootlin.com/linux/latest/source/Documentation/userspace-api/ioctl/ioctl-number.rst][ioctl-numbers]] |
|
241
|
* Rust |
9
|
242
|
:PROPERTIES: |
|
243
|
:ID: a3b9e17a-75a6-4aca-bf96-b713bc2ded43 |
|
244
|
:END: |
7
|
245
|
** crates |
9
|
246
|
:PROPERTIES: |
|
247
|
:ID: 4e7e4fb5-55f7-4036-b568-b84cefa45de8 |
|
248
|
:END: |
7
|
249
|
*** nix |
9
|
250
|
:PROPERTIES: |
|
251
|
:ID: 861e5180-14b4-47c9-a779-fe25c0428d7e |
|
252
|
:END: |
7
|
253
|
- [[https://crates.io/crates/nix][crates.io]] |
|
254
|
*** memmap2 |
9
|
255
|
:PROPERTIES: |
|
256
|
:ID: 320428ab-2d0f-4390-978f-c89907f8d0f4 |
|
257
|
:END: |
7
|
258
|
- [[https://crates.io/crates/memmap2][crates.io]] |
|
259
|
*** zstd |
9
|
260
|
:PROPERTIES: |
|
261
|
:ID: 1cf1597d-2f42-4b92-b8fc-a88c649f7cbf |
|
262
|
:END: |
7
|
263
|
- [[https://crates.io/crates/zstd][crates.io]] |
|
264
|
*** rocksdb |
9
|
265
|
:PROPERTIES: |
|
266
|
:ID: 1f8fae07-2fbb-4a35-8269-ea436f846193 |
|
267
|
:END: |
7
|
268
|
- [[https://crates.io/crates/rocksdb][crates.io]] |
|
269
|
*** tokio :tokio: |
9
|
270
|
:PROPERTIES: |
|
271
|
:ID: ec55c0e1-7862-4f05-a6ee-b59ffc68a8ff |
|
272
|
:END: |
7
|
273
|
- [[https://crates.io/crates/tokio][crates.io]] |
|
274
|
*** tracing :tokio: |
9
|
275
|
:PROPERTIES: |
|
276
|
:ID: bcf1904b-184e-4cae-86e7-5fcf57762944 |
|
277
|
:END: |
7
|
278
|
- [[https://crates.io/crates/tracing][crates.io]] |
|
279
|
**** tracing-subscriber |
9
|
280
|
:PROPERTIES: |
|
281
|
:ID: 85f6ed51-f3f3-489c-911a-e90c4974048e |
|
282
|
:END: |
7
|
283
|
- [[https://crates.io/crates/tracing-subscriber][crates.io]] |
|
284
|
*** axum :tokio: |
9
|
285
|
:PROPERTIES: |
|
286
|
:ID: 29e0fb8d-e35a-4e47-9b11-45cc4019e2db |
|
287
|
:END: |
7
|
288
|
- [[https://crates.io/crates/axum][crates.io]] |
|
289
|
*** tower :tokio: |
9
|
290
|
:PROPERTIES: |
|
291
|
:ID: 8e5a71ed-85d6-4562-be3a-9261ab376a0e |
|
292
|
:END: |
7
|
293
|
- [[https://crates.io/crates/tower][crates.io]] |
|
294
|
*** uuid |
9
|
295
|
:PROPERTIES: |
|
296
|
:ID: f6f24187-53b1-408e-b3ac-a101c9ba3040 |
|
297
|
:END: |
7
|
298
|
- [[https://crates.io/crates/uuid][crates.io]] |
|
299
|
** unstable |
9
|
300
|
:PROPERTIES: |
|
301
|
:ID: c09c812a-e884-4a28-ac4b-4f997ad2e932 |
|
302
|
:END: |
7
|
303
|
*** lazy_cell |
9
|
304
|
:PROPERTIES: |
|
305
|
:ID: 990d862d-80b1-4620-aa6a-d5e1a2c23517 |
|
306
|
:END: |
7
|
307
|
- [[https://github.com/rust-lang/rust/issues/109736][tracking-issue]] |
|
308
|
*** {BTreeMap,BTreeSet}::extract_if |
9
|
309
|
:PROPERTIES: |
|
310
|
:ID: 15bcf475-336a-4ed0-9b1d-921414c4ff9a |
|
311
|
:END: |
7
|
312
|
- [[https://github.com/rust-lang/rust/issues/70530][tracking-issue]] |
|
313
|
* Lisp |
9
|
314
|
:PROPERTIES: |
|
315
|
:ID: 5aac7727-f53a-4414-9d6b-2cb50fb45c87 |
|
316
|
:END: |
7
|
317
|
** ASDF |
9
|
318
|
:PROPERTIES: |
|
319
|
:ID: 043ab5da-6f3f-47ee-b9cf-ba8f0c7bb87c |
|
320
|
:END: |
7
|
321
|
- [[https://gitlab.common-lisp.net/asdf/asdf][gitlab.common-lisp.net]] |
|
322
|
- [[https://asdf.common-lisp.dev/][common-lisp.dev]] |
|
323
|
- [[https://github.com/fare/asdf/blob/master/doc/best_practices.md][best-practices]] |
|
324
|
- includes UIOP |
|
325
|
** Reference Projects |
9
|
326
|
:PROPERTIES: |
|
327
|
:ID: f25d3c51-7338-484c-9068-31c1a4c7a565 |
|
328
|
:END: |
7
|
329
|
*** StumpWM |
9
|
330
|
:PROPERTIES: |
|
331
|
:ID: 23dcbfef-b703-4dc5-a60a-9f2be66e32f2 |
|
332
|
:END: |
7
|
333
|
- [[https://github.com/stumpwm/stumpwm][github]] |
|
334
|
*** Nyxt |
9
|
335
|
:PROPERTIES: |
|
336
|
:ID: bfbb355d-2b09-4450-b39a-368a5f685d77 |
|
337
|
:END: |
7
|
338
|
- [[https://github.com/atlas-engineer/nyxt][github]] |
|
339
|
*** Kons-9 |
9
|
340
|
:PROPERTIES: |
|
341
|
:ID: 19929b73-2a4c-43f1-b04c-ec88dfa209bd |
|
342
|
:END: |
7
|
343
|
- [[https://github.com/kaveh808/kons-9][github]] |
|
344
|
*** cl-torrents |
9
|
345
|
:PROPERTIES: |
|
346
|
:ID: 935dbb0f-2f04-46c1-b250-48dea359398d |
|
347
|
:END: |
7
|
348
|
- [[https://github.com/vindarel/cl-torrents][github]] |
|
349
|
*** Mezzano |
9
|
350
|
:PROPERTIES: |
|
351
|
:ID: 2da96a46-f71c-436a-ab60-8f2a30469b15 |
|
352
|
:END: |
7
|
353
|
- [[https://github.com/froggey/Mezzano][github]] |
|
354
|
*** yalo |
9
|
355
|
:PROPERTIES: |
|
356
|
:ID: 38459f44-90e3-4fcc-a829-27c60e28b2cd |
|
357
|
:END: |
7
|
358
|
- [[https://github.com/whily/yalo][github]] |
|
359
|
*** cl-ledger |
9
|
360
|
:PROPERTIES: |
|
361
|
:ID: d2835614-461c-4bdd-8f25-a055b51797f4 |
|
362
|
:END: |
7
|
363
|
- [[https://github.com/ledger/cl-ledger][github]] |
|
364
|
*** Lem |
9
|
365
|
:PROPERTIES: |
|
366
|
:ID: 8aa32222-31a9-4dc5-999c-7f10a9649d9f |
|
367
|
:END: |
7
|
368
|
- [[https://github.com/lem-project/lem][github]] |
|
369
|
*** kindista |
9
|
370
|
:PROPERTIES: |
|
371
|
:ID: 977ccbf4-ca2f-466b-9420-105df90cfcdc |
|
372
|
:END: |
7
|
373
|
- [[https://github.com/kindista/kindista][github]] |
|
374
|
*** lisp-chat |
9
|
375
|
:PROPERTIES: |
|
376
|
:ID: c745e1c8-a675-4cfe-bb7f-30916b9198dd |
|
377
|
:END: |
7
|
378
|
- [[https://github.com/ryukinix/lisp-chat][github]] |
|
379
|
* Refs |
9
|
380
|
:PROPERTIES: |
|
381
|
:ID: 9a03c3b2-e9b6-4ab8-a1c5-3517374afbf0 |
|
382
|
:END: |
17
|
383
|
|