changelog shortlog graph tags branches changeset files revisions annotate raw help

Mercurial > org > docs / nas-t/notes.org

changeset 5: bb51c61e4d4b
parent: bd85a72319d8
child: a0017112db77
author: ellis <ellis@rwest.io>
date: Fri, 24 Nov 2023 22:39:07 -0500
permissions: -rw-r--r--
description: blog update
1 {{{header(notes,
2 Richard Westhaver,
3 ellis@rwest.io,
4 NAS-T Notes)}}}
5 #+BIBLIOGRAPHY: refs.bib
6 * File Systems
7 ** BTRFS
8 #+begin_quote
9 BTRFS is a Linux filesystem based on copy-on-write, allowing for
10 efficient snapshots and clones.
11 
12 It uses B-trees as its main on-disk data structure. The design goal is
13 to work well for many use cases and workloads. To this end, much
14 effort has been directed to maintaining even performance as the
15 filesystem ages, rather than trying to support a particular narrow
16 benchmark use-case.
17 
18 Linux filesystems are installed on smartphones as well as enterprise
19 servers. This entails challenges on many different fronts.
20 
21 - Scalability :: The filesystem must scale in many dimensions: disk
22  space, memory, and CPUs.
23 
24 - Data integrity :: Losing data is not an option, and much effort is
25  expended to safeguard the content. This includes checksums, metadata
26  duplication, and RAID support built into the filesystem.
27 
28 - Disk diversity :: The system should work well with SSDs and hard
29  disks. It is also expected to be able to use an array of different
30  sized disks, which poses challenges to the RAID and striping
31  mechanisms.
32 #+end_quote
33 -- [cite/t/f:@btrfs]
34 *** [2023-08-08 Tue] btrfs performance speculation ::
35  - [[https://www.percona.com/blog/taking-a-look-at-btrfs-for-mysql/]]
36  - zfs outperforms immensely, but potential misconfiguration on btrfs side (virt+cow
37  still enabled?)
38  - https://www.ctrl.blog/entry/btrfs-vs-ext4-performance.html
39  - see the follow up comment on this post
40  - https://www.reddit.com/r/archlinux/comments/o2gc42/is_the_performance_hit_of_btrfs_serious_is_it/
41  #+begin_quote
42  I’m the author of OP’s first link. I use BtrFS today. I often shift lots of
43  de-duplicatable data around, and benefit greatly from file cloning. The data is actually
44  the same data that caused the slow performance in the article. BtrFS and file cloning
45  now performs this task quicker than a traditional file system. (Hm. It’s time for a
46  follow-up article.)
47 
48  In a laptop with one drive: it doesn’t matter too much unless you do work that benefit
49  from file cloning or snapshots. This will likely require you to adjust your tooling and
50  workflow. I’ve had to rewrite the software I use every day to make it take advantage of
51  the capabilities of a more modern file system. You won’t benefit much from the data
52  recovery and redundancy features unless you’ve got two storage drives in your laptop and
53  can setup redundant data copies.
54 
55  on similar hardware to mine?
56 
57  It’s not a question about your hardware as much as how you use it. The bad performance I
58  documented was related to lots and lots of simultaneous random reads and writes. This
59  might not be representative of how you use your computer.
60  #+end_quote
61  - https://dl.acm.org/doi/fullHtml/10.1145/3386362
62  - this is about distributed file systems (in this case Ceph) - they argue against
63  basing DFS on ondisk-format filesystems (XFS ext4) - developed BlueStore as
64  backend, which runs directly on raw storage hardware.
65  - this is a good approach, but expensive (2 years in development) and risky
66  - better approach is to take advantage of a powerful enough existing ondisk-FS
67  format and pair it with supporting modules which abstract away the 'distributed'
68  mechanics.
69  - the strategy presented here is critical for enterprise-grade hardware where the
70  ondisk filesystem becomes the bottleneck that you're looking to optimize
71  - https://lore.kernel.org/lkml/cover.1676908729.git.dsterba@suse.com/
72  - linux 6.3 patch by David Sterba [2023-02-20 Mon]
73  - btrfs continues to show improvements in the linux kernel, ironing out the kinks
74  - makes it hard to compare benchmarks tho :/
75 *** MacOS support
76 - see this WIP k-ext for macos: [[https://github.com/relalis/macos-btrfs][macos-btrfs]]
77  - maybe we can help out with the VFS/mount support
78 *** on-disk format
79 - [[https://btrfs.readthedocs.io/en/latest/dev/On-disk-format.html][on-disk-format]]
80 - 'btrfs consists entirely of several trees. the trees use copy-on-write.'
81 - trees are stored in nodes which belong to a level in the b-tree structure.
82 - internal nodes (inodes) contain refs to other inodes on the /next/ level OR
83  - to leaf nodes then the level reaches 0.
84 - leaf nodes contain various types depending on the tree.
85 - basic structures
86  - 0:8 uint = objectid, each tree has its own set of object IDs
87  - 8:1 uint = item type
88  - 9:8 uint = offset, depends on type.
89  - little-endian
90  - fields are unsigned
91  - *superblock*
92  - primary superblock is located at 0x10000 (64KiB)
93  - Mirror copies of the superblock are located at physical addresses 0x4000000 (64
94  MiB) and 0x4000000000 (256GiB), if valid. copies are updated simultaneously.
95  - during mount only the first super block at 0x10000 is read, error causes mount to
96  fail.
97  - BTRFS onls recognizes disks with a valid 0x10000 superblock.
98  - *header*
99  - stored at the start of every inode
100  - data following it depends on whether it is an internal or leaf node.
101  - *inode*
102  - node header followed by a number of key pointers
103  - 0:11 key
104  - 11:8 uint = block number
105  - 19:8 uint = generation
106  - *lnode*
107  - leaf nodes contain header followed by key pointers
108  - 0:11 key
109  - 11:4 uint = data offset relative to end of header(65)
110  - 15:4 uint = data size
111 - objects
112  - ROOT_TREE
113  - holds ROOT_ITEMs, ROOT_REFs, and ROOT_BACKREFs for every tree other than itself.
114  - used to find the other trees and to determine the subvol structure.
115  - holds items for the 'root tree directory'. laddr is store in the superblock
116  - objectIDs
117  - free ids: BTRFS_FIRST_FREE_OBJECTID=256ULL:BTRFS_LAST_FREE_OBJECTID=-256ULL
118  - otherwise used for internal use
119 *** send-stream format
120 - [[https://btrfs.readthedocs.io/en/latest/dev/dev-send-stream.html][send stream format]]
121 - Send stream format represents a linear sequence of commands describing actions to be
122  performed on the target filesystem (receive side), created on the source filesystem
123  (send side).
124 - The stream is currently used in two ways: to generate a stream representing a
125  standalone subvolume (full mode) or a difference between two snapshots of the same
126  subvolume (incremental mode).
127 - The stream can be generated using a set of other subvolumes to look for extent
128  references that could lead to a more efficient stream by transferring only the
129  references and not full data.
130 - The stream format is abstracted from on-disk structures (though it may share some
131  BTRFS specifics), the stream instructions could be generated by other means than the
132  send ioctl.
133 - it's a checksum+TLV
134 - header: u32len,u16cmd,u32crc32c
135 - data: type,length,raw data
136 - the v2 protocol supports the encoded commands
137 - the commands are kinda clunky - need to MKFIL/MKDIR then RENAM to create
138 *** [2023-08-09 Wed] ioctls
139 - magic#: 0x94
140  - https://docs.kernel.org/userspace-api/ioctl/ioctl-number.html
141  - Btrfs filesystem some lifted to vfs/generic
142  - fs/btrfs/ioctl.h and linux/fs.h
143 ** ZFS
144 -- [cite/t/f:@zfs]
145 
146 - core component of TrueNAS software
147 ** TMPFS
148 -- [cite/t/f:@tmpfs]
149 - in-mem FS
150 ** EXT4
151 -- [cite/t/f:@ext4]
152 ** XFS
153 -- [cite/t/f:@xfs]
154 -- [cite/t/f:@xfs-scalability]
155 * Storage Mediums
156 ** HDD
157 -- [cite/t/f:@hd-failure-ml]
158 ** SSD
159 -- [cite/t/f:@smart-ssd-qp]
160 -- [cite/t/f:@ssd-perf-opt]
161 
162 ** Flash
163 -- [cite/t/f:@flash-openssd-systems]
164 ** NVMe
165 -- [cite/t/f:@nvme-ssd-ux]
166 -- [[https://nvmexpress.org/specifications/][specifications]]
167 *** ZNS
168 -- [cite/t/f:@zns-usenix]
169 #+begin_quote
170 Zoned Storage is an open source, standards-based initiative to enable data centers to
171 scale efficiently for the zettabyte storage capacity era. There are two technologies
172 behind Zoned Storage, Shingled Magnetic Recording (SMR) in ATA/SCSI HDDs and Zoned
173 Namespaces (ZNS) in NVMe SSDs.
174 #+end_quote
175 -- [[https://zonedstorage.io/][zonedstorage.io]]
176 -- $465 8tb 2.5"? [[https://www.serversupply.com/SSD/PCI-E/7.68TB/WESTERN%20DIGITAL/WUS4BB076D7P3E3_332270.htm][retail]]
177 ** eMMC
178 -- [cite/t/f:@emmc-mobile-io]
179 * Linux
180 ** syscalls
181 *** ioctl
182 - [[https://elixir.bootlin.com/linux/latest/source/Documentation/userspace-api/ioctl/ioctl-number.rst][ioctl-numbers]]
183 * Rust
184 ** crates
185 *** nix
186 - [[https://crates.io/crates/nix][crates.io]]
187 *** memmap2
188 - [[https://crates.io/crates/memmap2][crates.io]]
189 *** zstd
190 - [[https://crates.io/crates/zstd][crates.io]]
191 *** rocksdb
192 - [[https://crates.io/crates/rocksdb][crates.io]]
193 *** tokio :tokio:
194 - [[https://crates.io/crates/tokio][crates.io]]
195 *** tracing :tokio:
196 - [[https://crates.io/crates/tracing][crates.io]]
197 **** tracing-subscriber
198 - [[https://crates.io/crates/tracing-subscriber][crates.io]]
199 *** axum :tokio:
200 - [[https://crates.io/crates/axum][crates.io]]
201 *** tower :tokio:
202 - [[https://crates.io/crates/tower][crates.io]]
203 *** uuid
204 - [[https://crates.io/crates/uuid][crates.io]]
205 ** unstable
206 *** lazy_cell
207 - [[https://github.com/rust-lang/rust/issues/109736][tracking-issue]]
208 *** {BTreeMap,BTreeSet}::extract_if
209 - [[https://github.com/rust-lang/rust/issues/70530][tracking-issue]]
210 * Lisp
211 ** ASDF
212 - [[https://gitlab.common-lisp.net/asdf/asdf][gitlab.common-lisp.net]]
213 - [[https://asdf.common-lisp.dev/][common-lisp.dev]]
214 - [[https://github.com/fare/asdf/blob/master/doc/best_practices.md][best-practices]]
215 - includes UIOP
216 ** Reference Projects
217 *** StumpWM
218 - [[https://github.com/stumpwm/stumpwm][github]]
219 *** Nyxt
220 - [[https://github.com/atlas-engineer/nyxt][github]]
221 *** Kons-9
222 - [[https://github.com/kaveh808/kons-9][github]]
223 *** cl-torrents
224 - [[https://github.com/vindarel/cl-torrents][github]]
225 *** Mezzano
226 - [[https://github.com/froggey/Mezzano][github]]
227 *** yalo
228 - [[https://github.com/whily/yalo][github]]
229 *** cl-ledger
230 - [[https://github.com/ledger/cl-ledger][github]]
231 *** Lem
232 - [[https://github.com/lem-project/lem][github]]
233 *** kindista
234 - [[https://github.com/kindista/kindista][github]]
235 *** lisp-chat
236 - [[https://github.com/ryukinix/lisp-chat][github]]
237 * Refs
238 #+print_bibliography: