2 #+BIBLIOGRAPHY: refs.bib
6 BTRFS is a Linux filesystem based on copy-on-write, allowing for
7 efficient snapshots and clones.
9 It uses B-trees as its main on-disk data structure. The design goal is
10 to work well for many use cases and workloads. To this end, much
11 effort has been directed to maintaining even performance as the
12 filesystem ages, rather than trying to support a particular narrow
15 Linux filesystems are installed on smartphones as well as enterprise
16 servers. This entails challenges on many different fronts.
18 - Scalability :: The filesystem must scale in many dimensions: disk
19 space, memory, and CPUs.
21 - Data integrity :: Losing data is not an option, and much effort is
22 expended to safeguard the content. This includes checksums, metadata
23 duplication, and RAID support built into the filesystem.
25 - Disk diversity :: The system should work well with SSDs and hard
26 disks. It is also expected to be able to use an array of different
27 sized disks, which poses challenges to the RAID and striping
31 *** [2023-08-08 Tue] btrfs performance speculation :: 32 - [[https://www.percona.com/blog/taking-a-look-at-btrfs-for-mysql/]] 33 - zfs outperforms immensely, but potential misconfiguration on btrfs side (virt+cow 35 - https://www.ctrl.blog/entry/btrfs-vs-ext4-performance.html 36 - see the follow up comment on this post 37 - https://www.reddit.com/r/archlinux/comments/o2gc42/is_the_performance_hit_of_btrfs_serious_is_it/ 39 I’m the author of OP’s first link. I use BtrFS today. I often shift lots of 40 de-duplicatable data around, and benefit greatly from file cloning. The data is actually 41 the same data that caused the slow performance in the article. BtrFS and file cloning 42 now performs this task quicker than a traditional file system. (Hm. It’s time for a 45 In a laptop with one drive: it doesn’t matter too much unless you do work that benefit 46 from file cloning or snapshots. This will likely require you to adjust your tooling and 47 workflow. I’ve had to rewrite the software I use every day to make it take advantage of 48 the capabilities of a more modern file system. You won’t benefit much from the data 49 recovery and redundancy features unless you’ve got two storage drives in your laptop and 50 can setup redundant data copies. 52 on similar hardware to mine? 54 It’s not a question about your hardware as much as how you use it. The bad performance I 55 documented was related to lots and lots of simultaneous random reads and writes. This 56 might not be representative of how you use your computer. 58 - https://dl.acm.org/doi/fullHtml/10.1145/3386362 59 - this is about distributed file systems (in this case Ceph) - they argue against 60 basing DFS on ondisk-format filesystems (XFS ext4) - developed BlueStore as 61 backend, which runs directly on raw storage hardware. 62 - this is a good approach, but expensive (2 years in development) and risky 63 - better approach is to take advantage of a powerful enough existing ondisk-FS 64 format and pair it with supporting modules which abstract away the 'distributed' 66 - the strategy presented here is critical for enterprise-grade hardware where the 67 ondisk filesystem becomes the bottleneck that you're looking to optimize 68 - https://lore.kernel.org/lkml/cover.1676908729.git.dsterba@suse.com/ 69 - linux 6.3 patch by David Sterba [2023-02-20 Mon] 70 - btrfs continues to show improvements in the linux kernel, ironing out the kinks 71 - makes it hard to compare benchmarks tho :/ 73 - see this WIP k-ext for macos: [[https://github.com/relalis/macos-btrfs][macos-btrfs]] 74 - maybe we can help out with the VFS/mount support
76 - [[https://btrfs.readthedocs.io/en/latest/dev/On-disk-format.html][on-disk-format]] 77 - 'btrfs consists entirely of several trees. the trees use copy-on-write.'
78 - trees are stored in nodes which belong to a level in the b-tree structure.
79 - internal nodes (inodes) contain refs to other inodes on the
/next/ level OR
80 - to leaf nodes then the level reaches 0.
81 - leaf nodes contain various types depending on the tree.
83 - 0:8 uint
= objectid, each tree has its own set of object IDs 84 - 8:1 uint = item type
85 - 9:8 uint
= offset, depends on type. 89 - primary superblock is located at 0x10000 (64KiB) 90 - Mirror copies of the superblock are located at physical addresses 0x4000000 (64 91 MiB) and 0x4000000000 (256GiB), if valid. copies are updated simultaneously. 92 - during mount only the first super block at 0x10000 is read, error causes mount to 94 - BTRFS onls recognizes disks with a valid 0x10000 superblock. 96 - stored at the start of every inode 97 - data following it depends on whether it is an internal or leaf node. 99 - node header followed by a number of key pointers 101 - 11:8 uint = block number
102 - 19:8 uint
= generation 104 - leaf nodes contain header followed by key pointers 106 - 11:4 uint = data offset relative to end of header(65)
107 - 15:4 uint
= data size 110 - holds ROOT_ITEMs, ROOT_REFs, and ROOT_BACKREFs for every tree other than itself. 111 - used to find the other trees and to determine the subvol structure. 112 - holds items for the 'root tree directory'. laddr is store in the superblock 114 - free ids: BTRFS_FIRST_FREE_OBJECTID=256ULL:BTRFS_LAST_FREE_OBJECTID=-256ULL
115 - otherwise used for internal use
116 *** send-stream format 117 - [[https://btrfs.readthedocs.io/en/latest/dev/dev-send-stream.html][send stream format]] 118 - Send stream format represents a linear sequence of commands describing actions to be
119 performed on the target filesystem (receive side), created on the source filesystem
121 - The stream is currently used in two ways: to generate a stream representing a
122 standalone subvolume (full mode) or a difference between two snapshots of the same
123 subvolume (incremental mode).
124 - The stream can be generated using a set of other subvolumes to look for extent
125 references that could lead to a more efficient stream by transferring only the
126 references and not full data.
127 - The stream format is abstracted from on-disk structures (though it may share some
128 BTRFS specifics), the stream instructions could be generated by other means than the
130 - it's a checksum+TLV
131 - header: u32len,u16cmd,u32crc32c
132 - data: type,length,raw data
133 - the v2 protocol supports the encoded commands
134 - the commands are kinda clunky - need to MKFIL/MKDIR then RENAM to create
135 *** [2023-08-09 Wed] ioctls 137 - https://docs.kernel.org/userspace-api/ioctl/ioctl-number.html
138 - Btrfs filesystem some lifted to vfs/generic
139 - fs/btrfs/ioctl.h and linux/fs.h
143 - core component of TrueNAS software
151 -- [cite/t/f:@xfs-scalability]
154 -- [cite/t/f:@hd-failure-ml]
156 -- [cite/t/f:@smart-ssd-qp]
157 -- [cite/t/f:@ssd-perf-opt]
160 -- [cite/t/f:@flash-openssd-systems]
162 -- [cite/t/f:@nvme-ssd-ux]
163 --
[[https://nvmexpress.org/specifications/][specifications]] 165 -- [cite/t/f:@zns-usenix]
167 Zoned Storage is an open source, standards-based initiative to enable data centers to
168 scale efficiently for the zettabyte storage capacity era. There are two technologies
169 behind Zoned Storage, Shingled Magnetic Recording (SMR) in ATA/SCSI HDDs and Zoned
170 Namespaces (ZNS) in NVMe SSDs.
172 --
[[https://zonedstorage.io/][zonedstorage.io]] 173 -- $465 8tb 2.5"?
[[https://www.serversupply.com/SSD/PCI-E/7.68TB/WESTERN%20DIGITAL/WUS4BB076D7P3E3_332270.htm][retail]] 175 -- [cite/t/f:@emmc-mobile-io]
179 - [[https://elixir.bootlin.com/linux/latest/source/Documentation/userspace-api/ioctl/ioctl-number.rst][ioctl-numbers]] 183 - [[https://crates.io/crates/nix][crates.io]] 185 - [[https://crates.io/crates/memmap2][crates.io]] 187 - [[https://crates.io/crates/zstd][crates.io]] 189 - [[https://crates.io/crates/rocksdb][crates.io]] 191 - [[https://crates.io/crates/tokio][crates.io]] 193 - [[https://crates.io/crates/tracing][crates.io]] 194 **** tracing-subscriber 195 - [[https://crates.io/crates/tracing-subscriber][crates.io]] 197 - [[https://crates.io/crates/axum][crates.io]] 199 - [[https://crates.io/crates/tower][crates.io]] 201 - [[https://crates.io/crates/uuid][crates.io]] 204 - [[https://github.com/rust-lang/rust/issues/109736][tracking-issue]] 205 *** {BTreeMap,BTreeSet}::extract_if 206 - [[https://github.com/rust-lang/rust/issues/70530][tracking-issue]] 209 - [[https://gitlab.common-lisp.net/asdf/asdf][gitlab.common-lisp.net]] 210 - [[https://asdf.common-lisp.dev/][common-lisp.dev]] 211 - [[https://github.com/fare/asdf/blob/master/doc/best_practices.md][best-practices]] 213 ** Reference Projects 215 - [[https://github.com/stumpwm/stumpwm][github]] 217 - [[https://github.com/atlas-engineer/nyxt][github]] 219 - [[https://github.com/kaveh808/kons-9][github]] 221 - [[https://github.com/vindarel/cl-torrents][github]] 223 - [[https://github.com/froggey/Mezzano][github]] 225 - [[https://github.com/whily/yalo][github]] 227 - [[https://github.com/ledger/cl-ledger][github]] 229 - [[https://github.com/lem-project/lem][github]] 231 - [[https://github.com/kindista/kindista][github]] 233 - [[https://github.com/ryukinix/lisp-chat][github]] 235 #+print_bibliography: