changelog shortlog graph tags branches changeset files revisions annotate raw help

Mercurial > org > docs / nas-t/notes.org

changeset 28: a0017112db77
parent: bb51c61e4d4b
author: Richard Westhaver <ellis@rwest.io>
date: Thu, 06 Jun 2024 23:15:50 -0400
permissions: -rw-r--r--
description: style update
1 #+title: notes
2 #+author: Richard Westhaver
3 #+email: ellis@rwest.io
4 #+description: NAS-T Notes
5 #+setupfile: ../../clean.theme
6 #+BIBLIOGRAPHY: refs.bib
7 * File Systems
8 ** BTRFS
9 #+begin_quote
10 BTRFS is a Linux filesystem based on copy-on-write, allowing for
11 efficient snapshots and clones.
12 
13 It uses B-trees as its main on-disk data structure. The design goal is
14 to work well for many use cases and workloads. To this end, much
15 effort has been directed to maintaining even performance as the
16 filesystem ages, rather than trying to support a particular narrow
17 benchmark use-case.
18 
19 Linux filesystems are installed on smartphones as well as enterprise
20 servers. This entails challenges on many different fronts.
21 
22 - Scalability :: The filesystem must scale in many dimensions: disk
23  space, memory, and CPUs.
24 
25 - Data integrity :: Losing data is not an option, and much effort is
26  expended to safeguard the content. This includes checksums, metadata
27  duplication, and RAID support built into the filesystem.
28 
29 - Disk diversity :: The system should work well with SSDs and hard
30  disks. It is also expected to be able to use an array of different
31  sized disks, which poses challenges to the RAID and striping
32  mechanisms.
33 #+end_quote
34 -- [cite/t/f:@btrfs]
35 *** [2023-08-08 Tue] btrfs performance speculation ::
36  - [[https://www.percona.com/blog/taking-a-look-at-btrfs-for-mysql/]]
37  - zfs outperforms immensely, but potential misconfiguration on btrfs side (virt+cow
38  still enabled?)
39  - https://www.ctrl.blog/entry/btrfs-vs-ext4-performance.html
40  - see the follow up comment on this post
41  - https://www.reddit.com/r/archlinux/comments/o2gc42/is_the_performance_hit_of_btrfs_serious_is_it/
42  #+begin_quote
43  I’m the author of OP’s first link. I use BtrFS today. I often shift lots of
44  de-duplicatable data around, and benefit greatly from file cloning. The data is actually
45  the same data that caused the slow performance in the article. BtrFS and file cloning
46  now performs this task quicker than a traditional file system. (Hm. It’s time for a
47  follow-up article.)
48 
49  In a laptop with one drive: it doesn’t matter too much unless you do work that benefit
50  from file cloning or snapshots. This will likely require you to adjust your tooling and
51  workflow. I’ve had to rewrite the software I use every day to make it take advantage of
52  the capabilities of a more modern file system. You won’t benefit much from the data
53  recovery and redundancy features unless you’ve got two storage drives in your laptop and
54  can setup redundant data copies.
55 
56  on similar hardware to mine?
57 
58  It’s not a question about your hardware as much as how you use it. The bad performance I
59  documented was related to lots and lots of simultaneous random reads and writes. This
60  might not be representative of how you use your computer.
61  #+end_quote
62  - https://dl.acm.org/doi/fullHtml/10.1145/3386362
63  - this is about distributed file systems (in this case Ceph) - they argue against
64  basing DFS on ondisk-format filesystems (XFS ext4) - developed BlueStore as
65  backend, which runs directly on raw storage hardware.
66  - this is a good approach, but expensive (2 years in development) and risky
67  - better approach is to take advantage of a powerful enough existing ondisk-FS
68  format and pair it with supporting modules which abstract away the 'distributed'
69  mechanics.
70  - the strategy presented here is critical for enterprise-grade hardware where the
71  ondisk filesystem becomes the bottleneck that you're looking to optimize
72  - https://lore.kernel.org/lkml/cover.1676908729.git.dsterba@suse.com/
73  - linux 6.3 patch by David Sterba [2023-02-20 Mon]
74  - btrfs continues to show improvements in the linux kernel, ironing out the kinks
75  - makes it hard to compare benchmarks tho :/
76 *** MacOS support
77 - see this WIP k-ext for macos: [[https://github.com/relalis/macos-btrfs][macos-btrfs]]
78  - maybe we can help out with the VFS/mount support
79 *** on-disk format
80 - [[https://btrfs.readthedocs.io/en/latest/dev/On-disk-format.html][on-disk-format]]
81 - 'btrfs consists entirely of several trees. the trees use copy-on-write.'
82 - trees are stored in nodes which belong to a level in the b-tree structure.
83 - internal nodes (inodes) contain refs to other inodes on the /next/ level OR
84  - to leaf nodes then the level reaches 0.
85 - leaf nodes contain various types depending on the tree.
86 - basic structures
87  - 0:8 uint = objectid, each tree has its own set of object IDs
88  - 8:1 uint = item type
89  - 9:8 uint = offset, depends on type.
90  - little-endian
91  - fields are unsigned
92  - *superblock*
93  - primary superblock is located at 0x10000 (64KiB)
94  - Mirror copies of the superblock are located at physical addresses 0x4000000 (64
95  MiB) and 0x4000000000 (256GiB), if valid. copies are updated simultaneously.
96  - during mount only the first super block at 0x10000 is read, error causes mount to
97  fail.
98  - BTRFS onls recognizes disks with a valid 0x10000 superblock.
99  - *header*
100  - stored at the start of every inode
101  - data following it depends on whether it is an internal or leaf node.
102  - *inode*
103  - node header followed by a number of key pointers
104  - 0:11 key
105  - 11:8 uint = block number
106  - 19:8 uint = generation
107  - *lnode*
108  - leaf nodes contain header followed by key pointers
109  - 0:11 key
110  - 11:4 uint = data offset relative to end of header(65)
111  - 15:4 uint = data size
112 - objects
113  - ROOT_TREE
114  - holds ROOT_ITEMs, ROOT_REFs, and ROOT_BACKREFs for every tree other than itself.
115  - used to find the other trees and to determine the subvol structure.
116  - holds items for the 'root tree directory'. laddr is store in the superblock
117  - objectIDs
118  - free ids: BTRFS_FIRST_FREE_OBJECTID=256ULL:BTRFS_LAST_FREE_OBJECTID=-256ULL
119  - otherwise used for internal use
120 *** send-stream format
121 - [[https://btrfs.readthedocs.io/en/latest/dev/dev-send-stream.html][send stream format]]
122 - Send stream format represents a linear sequence of commands describing actions to be
123  performed on the target filesystem (receive side), created on the source filesystem
124  (send side).
125 - The stream is currently used in two ways: to generate a stream representing a
126  standalone subvolume (full mode) or a difference between two snapshots of the same
127  subvolume (incremental mode).
128 - The stream can be generated using a set of other subvolumes to look for extent
129  references that could lead to a more efficient stream by transferring only the
130  references and not full data.
131 - The stream format is abstracted from on-disk structures (though it may share some
132  BTRFS specifics), the stream instructions could be generated by other means than the
133  send ioctl.
134 - it's a checksum+TLV
135 - header: u32len,u16cmd,u32crc32c
136 - data: type,length,raw data
137 - the v2 protocol supports the encoded commands
138 - the commands are kinda clunky - need to MKFIL/MKDIR then RENAM to create
139 *** [2023-08-09 Wed] ioctls
140 - magic#: 0x94
141  - https://docs.kernel.org/userspace-api/ioctl/ioctl-number.html
142  - Btrfs filesystem some lifted to vfs/generic
143  - fs/btrfs/ioctl.h and linux/fs.h
144 ** ZFS
145 -- [cite/t/f:@zfs]
146 
147 - core component of TrueNAS software
148 ** TMPFS
149 -- [cite/t/f:@tmpfs]
150 - in-mem FS
151 ** EXT4
152 -- [cite/t/f:@ext4]
153 ** XFS
154 -- [cite/t/f:@xfs]
155 -- [cite/t/f:@xfs-scalability]
156 * Storage Mediums
157 ** HDD
158 -- [cite/t/f:@hd-failure-ml]
159 ** SSD
160 -- [cite/t/f:@smart-ssd-qp]
161 -- [cite/t/f:@ssd-perf-opt]
162 
163 ** Flash
164 -- [cite/t/f:@flash-openssd-systems]
165 ** NVMe
166 -- [cite/t/f:@nvme-ssd-ux]
167 -- [[https://nvmexpress.org/specifications/][specifications]]
168 *** ZNS
169 -- [cite/t/f:@zns-usenix]
170 #+begin_quote
171 Zoned Storage is an open source, standards-based initiative to enable data centers to
172 scale efficiently for the zettabyte storage capacity era. There are two technologies
173 behind Zoned Storage, Shingled Magnetic Recording (SMR) in ATA/SCSI HDDs and Zoned
174 Namespaces (ZNS) in NVMe SSDs.
175 #+end_quote
176 -- [[https://zonedstorage.io/][zonedstorage.io]]
177 -- $465 8tb 2.5"? [[https://www.serversupply.com/SSD/PCI-E/7.68TB/WESTERN%20DIGITAL/WUS4BB076D7P3E3_332270.htm][retail]]
178 ** eMMC
179 -- [cite/t/f:@emmc-mobile-io]
180 * Linux
181 ** syscalls
182 *** ioctl
183 - [[https://elixir.bootlin.com/linux/latest/source/Documentation/userspace-api/ioctl/ioctl-number.rst][ioctl-numbers]]
184 * Rust
185 ** crates
186 *** nix
187 - [[https://crates.io/crates/nix][crates.io]]
188 *** memmap2
189 - [[https://crates.io/crates/memmap2][crates.io]]
190 *** zstd
191 - [[https://crates.io/crates/zstd][crates.io]]
192 *** rocksdb
193 - [[https://crates.io/crates/rocksdb][crates.io]]
194 *** tokio :tokio:
195 - [[https://crates.io/crates/tokio][crates.io]]
196 *** tracing :tokio:
197 - [[https://crates.io/crates/tracing][crates.io]]
198 **** tracing-subscriber
199 - [[https://crates.io/crates/tracing-subscriber][crates.io]]
200 *** axum :tokio:
201 - [[https://crates.io/crates/axum][crates.io]]
202 *** tower :tokio:
203 - [[https://crates.io/crates/tower][crates.io]]
204 *** uuid
205 - [[https://crates.io/crates/uuid][crates.io]]
206 ** unstable
207 *** lazy_cell
208 - [[https://github.com/rust-lang/rust/issues/109736][tracking-issue]]
209 *** {BTreeMap,BTreeSet}::extract_if
210 - [[https://github.com/rust-lang/rust/issues/70530][tracking-issue]]
211 * Lisp
212 ** ASDF
213 - [[https://gitlab.common-lisp.net/asdf/asdf][gitlab.common-lisp.net]]
214 - [[https://asdf.common-lisp.dev/][common-lisp.dev]]
215 - [[https://github.com/fare/asdf/blob/master/doc/best_practices.md][best-practices]]
216 - includes UIOP
217 ** Reference Projects
218 *** StumpWM
219 - [[https://github.com/stumpwm/stumpwm][github]]
220 *** Nyxt
221 - [[https://github.com/atlas-engineer/nyxt][github]]
222 *** Kons-9
223 - [[https://github.com/kaveh808/kons-9][github]]
224 *** cl-torrents
225 - [[https://github.com/vindarel/cl-torrents][github]]
226 *** Mezzano
227 - [[https://github.com/froggey/Mezzano][github]]
228 *** yalo
229 - [[https://github.com/whily/yalo][github]]
230 *** cl-ledger
231 - [[https://github.com/ledger/cl-ledger][github]]
232 *** Lem
233 - [[https://github.com/lem-project/lem][github]]
234 *** kindista
235 - [[https://github.com/kindista/kindista][github]]
236 *** lisp-chat
237 - [[https://github.com/ryukinix/lisp-chat][github]]
238 * Refs
239 #+print_bibliography: