3 :ID: 29f7085e-53a3-4d70-90a7-e3437ee99775 7 :ID: 3e7bb82e-7dae-476a-8ed0-18c361c9bd1b 10 BTRFS is a Linux filesystem based on copy-on-write, allowing for
11 efficient snapshots and clones.
13 It uses B-trees as its main on-disk data structure. The design goal is
14 to work well for many use cases and workloads. To this end, much
15 effort has been directed to maintaining even performance as the
16 filesystem ages, rather than trying to support a particular narrow
19 Linux filesystems are installed on smartphones as well as enterprise
20 servers. This entails challenges on many different fronts.
22 - Scalability :: The filesystem must scale in many dimensions: disk
23 space, memory, and CPUs.
25 - Data integrity :: Losing data is not an option, and much effort is
26 expended to safeguard the content. This includes checksums, metadata
27 duplication, and RAID support built into the filesystem.
29 - Disk diversity :: The system should work well with SSDs and hard
30 disks. It is also expected to be able to use an array of different
31 sized disks, which poses challenges to the RAID and striping
35 *** [2023-08-08 Tue] btrfs performance speculation :: 37 :ID: 2b662144-97b2-4736-a8fc-bc8f861b9829 39 - [[https://www.percona.com/blog/taking-a-look-at-btrfs-for-mysql/]] 40 - zfs outperforms immensely, but potential misconfiguration on btrfs side (virt+cow 42 - https://www.ctrl.blog/entry/btrfs-vs-ext4-performance.html 43 - see the follow up comment on this post 44 - https://www.reddit.com/r/archlinux/comments/o2gc42/is_the_performance_hit_of_btrfs_serious_is_it/ 46 I’m the author of OP’s first link. I use BtrFS today. I often shift lots of 47 de-duplicatable data around, and benefit greatly from file cloning. The data is actually 48 the same data that caused the slow performance in the article. BtrFS and file cloning 49 now performs this task quicker than a traditional file system. (Hm. It’s time for a 52 In a laptop with one drive: it doesn’t matter too much unless you do work that benefit 53 from file cloning or snapshots. This will likely require you to adjust your tooling and 54 workflow. I’ve had to rewrite the software I use every day to make it take advantage of 55 the capabilities of a more modern file system. You won’t benefit much from the data 56 recovery and redundancy features unless you’ve got two storage drives in your laptop and 57 can setup redundant data copies. 59 on similar hardware to mine? 61 It’s not a question about your hardware as much as how you use it. The bad performance I 62 documented was related to lots and lots of simultaneous random reads and writes. This 63 might not be representative of how you use your computer. 65 - https://dl.acm.org/doi/fullHtml/10.1145/3386362 66 - this is about distributed file systems (in this case Ceph) - they argue against 67 basing DFS on ondisk-format filesystems (XFS ext4) - developed BlueStore as 68 backend, which runs directly on raw storage hardware. 69 - this is a good approach, but expensive (2 years in development) and risky 70 - better approach is to take advantage of a powerful enough existing ondisk-FS 71 format and pair it with supporting modules which abstract away the 'distributed' 73 - the strategy presented here is critical for enterprise-grade hardware where the 74 ondisk filesystem becomes the bottleneck that you're looking to optimize 75 - https://lore.kernel.org/lkml/cover.1676908729.git.dsterba@suse.com/ 76 - linux 6.3 patch by David Sterba [2023-02-20 Mon] 77 - btrfs continues to show improvements in the linux kernel, ironing out the kinks 78 - makes it hard to compare benchmarks tho :/ 81 :ID: 9ddb1caf-014a-4d0f-972e-82028e8be286 83 - see this WIP k-ext for macos: [[https://github.com/relalis/macos-btrfs][macos-btrfs]] 84 - maybe we can help out with the VFS/mount support
87 :ID: 7535f844-330b-4f9c-b2a1-4578d64acbf7 89 - [[https://btrfs.readthedocs.io/en/latest/dev/On-disk-format.html][on-disk-format]] 90 - 'btrfs consists entirely of several trees. the trees use copy-on-write.'
91 - trees are stored in nodes which belong to a level in the b-tree structure.
92 - internal nodes (inodes) contain refs to other inodes on the
/next/ level OR
93 - to leaf nodes then the level reaches 0.
94 - leaf nodes contain various types depending on the tree.
96 - 0:8 uint
= objectid, each tree has its own set of object IDs 97 - 8:1 uint = item type
98 - 9:8 uint
= offset, depends on type. 100 - fields are unsigned 102 - primary superblock is located at 0x10000 (64KiB) 103 - Mirror copies of the superblock are located at physical addresses 0x4000000 (64 104 MiB) and 0x4000000000 (256GiB), if valid. copies are updated simultaneously. 105 - during mount only the first super block at 0x10000 is read, error causes mount to 107 - BTRFS onls recognizes disks with a valid 0x10000 superblock. 109 - stored at the start of every inode 110 - data following it depends on whether it is an internal or leaf node. 112 - node header followed by a number of key pointers 114 - 11:8 uint = block number
115 - 19:8 uint
= generation 117 - leaf nodes contain header followed by key pointers 119 - 11:4 uint = data offset relative to end of header(65)
120 - 15:4 uint
= data size 123 - holds ROOT_ITEMs, ROOT_REFs, and ROOT_BACKREFs for every tree other than itself. 124 - used to find the other trees and to determine the subvol structure. 125 - holds items for the 'root tree directory'. laddr is store in the superblock 127 - free ids: BTRFS_FIRST_FREE_OBJECTID=256ULL:BTRFS_LAST_FREE_OBJECTID=-256ULL
128 - otherwise used for internal use
129 *** send-stream format 131 :ID: 1d1a6211-c91f-48ae-8113-0ddada286cee 133 - [[https://btrfs.readthedocs.io/en/latest/dev/dev-send-stream.html][send stream format]] 134 - Send stream format represents a linear sequence of commands describing actions to be
135 performed on the target filesystem (receive side), created on the source filesystem
137 - The stream is currently used in two ways: to generate a stream representing a
138 standalone subvolume (full mode) or a difference between two snapshots of the same
139 subvolume (incremental mode).
140 - The stream can be generated using a set of other subvolumes to look for extent
141 references that could lead to a more efficient stream by transferring only the
142 references and not full data.
143 - The stream format is abstracted from on-disk structures (though it may share some
144 BTRFS specifics), the stream instructions could be generated by other means than the
146 - it's a checksum+TLV
147 - header: u32len,u16cmd,u32crc32c
148 - data: type,length,raw data
149 - the v2 protocol supports the encoded commands
150 - the commands are kinda clunky - need to MKFIL/MKDIR then RENAM to create
151 *** [2023-08-09 Wed] ioctls 153 :ID: be04dc90-86a6-46c5-9dcf-25519ebed34d 156 - https://docs.kernel.org/userspace-api/ioctl/ioctl-number.html
157 - Btrfs filesystem some lifted to vfs/generic
158 - fs/btrfs/ioctl.h and linux/fs.h
161 :ID: 219cb1b5-7dff-4800-8ad0-2a19309a9e9f 165 - core component of TrueNAS software
168 :ID: f2167ca1-f751-4ee3-a398-4cf7fff6b57c 174 :ID: 698fb02f-dc73-40be-8bac-0af3a03c39c6 179 :ID: 8c6cf1e4-1555-4270-a101-40b6fbb0a1f9 182 -- [cite/t/f:@xfs-scalability]
185 :ID: e3701458-b333-44e3-b6f2-12861d6287ed 189 :ID: dfe51b59-5f9d-4e7d-86f1-27e51453ae1f 191 -- [cite/t/f:@hd-failure-ml]
194 :ID: 9c9a5470-d0c2-49f1-9dc9-d0d62c841a19 196 -- [cite/t/f:@smart-ssd-qp]
197 -- [cite/t/f:@ssd-perf-opt]
201 :ID: 1414bda3-7fae-4fe0-ae65-8a5ed05ad822 203 -- [cite/t/f:@flash-openssd-systems]
206 :ID: 95e44402-9235-4b4b-a772-b91d78e38a6b 208 -- [cite/t/f:@nvme-ssd-ux]
209 --
[[https://nvmexpress.org/specifications/][specifications]] 212 :ID: 5639429c-1c9d-4cf0-b69c-cc45528cac50 214 -- [cite/t/f:@zns-usenix]
216 Zoned Storage is an open source, standards-based initiative to enable data centers to
217 scale efficiently for the zettabyte storage capacity era. There are two technologies
218 behind Zoned Storage, Shingled Magnetic Recording (SMR) in ATA/SCSI HDDs and Zoned
219 Namespaces (ZNS) in NVMe SSDs.
221 --
[[https://zonedstorage.io/][zonedstorage.io]] 222 -- $465 8tb 2.5"?
[[https://www.serversupply.com/SSD/PCI-E/7.68TB/WESTERN%20DIGITAL/WUS4BB076D7P3E3_332270.htm][retail]] 225 :ID: b8539369-0e0f-4f23-be8f-cd38be031bac 227 -- [cite/t/f:@emmc-mobile-io]
230 :ID: b244015e-f3e2-4837-8186-a2f5edef1f14 234 :ID: 935f67e0-eef6-4913-9dcc-8530129be37c 238 :ID: 134d256a-f7b3-4603-846c-b6c9bad2d708 240 - [[https://elixir.bootlin.com/linux/latest/source/Documentation/userspace-api/ioctl/ioctl-number.rst][ioctl-numbers]] 243 :ID: a3b9e17a-75a6-4aca-bf96-b713bc2ded43 247 :ID: 4e7e4fb5-55f7-4036-b568-b84cefa45de8 251 :ID: 861e5180-14b4-47c9-a779-fe25c0428d7e 253 - [[https://crates.io/crates/nix][crates.io]] 256 :ID: 320428ab-2d0f-4390-978f-c89907f8d0f4 258 - [[https://crates.io/crates/memmap2][crates.io]] 261 :ID: 1cf1597d-2f42-4b92-b8fc-a88c649f7cbf 263 - [[https://crates.io/crates/zstd][crates.io]] 266 :ID: 1f8fae07-2fbb-4a35-8269-ea436f846193 268 - [[https://crates.io/crates/rocksdb][crates.io]] 271 :ID: ec55c0e1-7862-4f05-a6ee-b59ffc68a8ff 273 - [[https://crates.io/crates/tokio][crates.io]] 276 :ID: bcf1904b-184e-4cae-86e7-5fcf57762944 278 - [[https://crates.io/crates/tracing][crates.io]] 279 **** tracing-subscriber 281 :ID: 85f6ed51-f3f3-489c-911a-e90c4974048e 283 - [[https://crates.io/crates/tracing-subscriber][crates.io]] 286 :ID: 29e0fb8d-e35a-4e47-9b11-45cc4019e2db 288 - [[https://crates.io/crates/axum][crates.io]] 291 :ID: 8e5a71ed-85d6-4562-be3a-9261ab376a0e 293 - [[https://crates.io/crates/tower][crates.io]] 296 :ID: f6f24187-53b1-408e-b3ac-a101c9ba3040 298 - [[https://crates.io/crates/uuid][crates.io]] 301 :ID: c09c812a-e884-4a28-ac4b-4f997ad2e932 305 :ID: 990d862d-80b1-4620-aa6a-d5e1a2c23517 307 - [[https://github.com/rust-lang/rust/issues/109736][tracking-issue]] 308 *** {BTreeMap,BTreeSet}::extract_if 310 :ID: 15bcf475-336a-4ed0-9b1d-921414c4ff9a 312 - [[https://github.com/rust-lang/rust/issues/70530][tracking-issue]] 315 :ID: 5aac7727-f53a-4414-9d6b-2cb50fb45c87 319 :ID: 043ab5da-6f3f-47ee-b9cf-ba8f0c7bb87c 321 - [[https://gitlab.common-lisp.net/asdf/asdf][gitlab.common-lisp.net]] 322 - [[https://asdf.common-lisp.dev/][common-lisp.dev]] 323 - [[https://github.com/fare/asdf/blob/master/doc/best_practices.md][best-practices]] 325 ** Reference Projects 327 :ID: f25d3c51-7338-484c-9068-31c1a4c7a565 331 :ID: 23dcbfef-b703-4dc5-a60a-9f2be66e32f2 333 - [[https://github.com/stumpwm/stumpwm][github]] 336 :ID: bfbb355d-2b09-4450-b39a-368a5f685d77 338 - [[https://github.com/atlas-engineer/nyxt][github]] 341 :ID: 19929b73-2a4c-43f1-b04c-ec88dfa209bd 343 - [[https://github.com/kaveh808/kons-9][github]] 346 :ID: 935dbb0f-2f04-46c1-b250-48dea359398d 348 - [[https://github.com/vindarel/cl-torrents][github]] 351 :ID: 2da96a46-f71c-436a-ab60-8f2a30469b15 353 - [[https://github.com/froggey/Mezzano][github]] 356 :ID: 38459f44-90e3-4fcc-a829-27c60e28b2cd 358 - [[https://github.com/whily/yalo][github]] 361 :ID: d2835614-461c-4bdd-8f25-a055b51797f4 363 - [[https://github.com/ledger/cl-ledger][github]] 366 :ID: 8aa32222-31a9-4dc5-999c-7f10a9649d9f 368 - [[https://github.com/lem-project/lem][github]] 371 :ID: 977ccbf4-ca2f-466b-9420-105df90cfcdc 373 - [[https://github.com/kindista/kindista][github]] 376 :ID: c745e1c8-a675-4cfe-bb7f-30916b9198dd 378 - [[https://github.com/ryukinix/lisp-chat][github]] 381 :ID: 9a03c3b2-e9b6-4ab8-a1c5-3517374afbf0