changelog shortlog graph tags branches changeset files revisions annotate raw help

Mercurial > org > notes / nas-t.org

changeset 7: d543f73892d3
child: 4839b0675118
author: Richard Westhaver <ellis@rwest.io>
date: Sat, 22 Jun 2024 23:56:08 -0400
permissions: -rw-r--r--
description: add nas-t notes
1 #+BIBLIOGRAPHY: refs.bib
2 * File Systems
3 ** BTRFS
4 #+begin_quote
5 BTRFS is a Linux filesystem based on copy-on-write, allowing for
6 efficient snapshots and clones.
7 
8 It uses B-trees as its main on-disk data structure. The design goal is
9 to work well for many use cases and workloads. To this end, much
10 effort has been directed to maintaining even performance as the
11 filesystem ages, rather than trying to support a particular narrow
12 benchmark use-case.
13 
14 Linux filesystems are installed on smartphones as well as enterprise
15 servers. This entails challenges on many different fronts.
16 
17 - Scalability :: The filesystem must scale in many dimensions: disk
18  space, memory, and CPUs.
19 
20 - Data integrity :: Losing data is not an option, and much effort is
21  expended to safeguard the content. This includes checksums, metadata
22  duplication, and RAID support built into the filesystem.
23 
24 - Disk diversity :: The system should work well with SSDs and hard
25  disks. It is also expected to be able to use an array of different
26  sized disks, which poses challenges to the RAID and striping
27  mechanisms.
28 #+end_quote
29 -- [cite/t/f:@btrfs]
30 *** [2023-08-08 Tue] btrfs performance speculation ::
31  - [[https://www.percona.com/blog/taking-a-look-at-btrfs-for-mysql/]]
32  - zfs outperforms immensely, but potential misconfiguration on btrfs side (virt+cow
33  still enabled?)
34  - https://www.ctrl.blog/entry/btrfs-vs-ext4-performance.html
35  - see the follow up comment on this post
36  - https://www.reddit.com/r/archlinux/comments/o2gc42/is_the_performance_hit_of_btrfs_serious_is_it/
37  #+begin_quote
38  I’m the author of OP’s first link. I use BtrFS today. I often shift lots of
39  de-duplicatable data around, and benefit greatly from file cloning. The data is actually
40  the same data that caused the slow performance in the article. BtrFS and file cloning
41  now performs this task quicker than a traditional file system. (Hm. It’s time for a
42  follow-up article.)
43 
44  In a laptop with one drive: it doesn’t matter too much unless you do work that benefit
45  from file cloning or snapshots. This will likely require you to adjust your tooling and
46  workflow. I’ve had to rewrite the software I use every day to make it take advantage of
47  the capabilities of a more modern file system. You won’t benefit much from the data
48  recovery and redundancy features unless you’ve got two storage drives in your laptop and
49  can setup redundant data copies.
50 
51  on similar hardware to mine?
52 
53  It’s not a question about your hardware as much as how you use it. The bad performance I
54  documented was related to lots and lots of simultaneous random reads and writes. This
55  might not be representative of how you use your computer.
56  #+end_quote
57  - https://dl.acm.org/doi/fullHtml/10.1145/3386362
58  - this is about distributed file systems (in this case Ceph) - they argue against
59  basing DFS on ondisk-format filesystems (XFS ext4) - developed BlueStore as
60  backend, which runs directly on raw storage hardware.
61  - this is a good approach, but expensive (2 years in development) and risky
62  - better approach is to take advantage of a powerful enough existing ondisk-FS
63  format and pair it with supporting modules which abstract away the 'distributed'
64  mechanics.
65  - the strategy presented here is critical for enterprise-grade hardware where the
66  ondisk filesystem becomes the bottleneck that you're looking to optimize
67  - https://lore.kernel.org/lkml/cover.1676908729.git.dsterba@suse.com/
68  - linux 6.3 patch by David Sterba [2023-02-20 Mon]
69  - btrfs continues to show improvements in the linux kernel, ironing out the kinks
70  - makes it hard to compare benchmarks tho :/
71 *** MacOS support
72 - see this WIP k-ext for macos: [[https://github.com/relalis/macos-btrfs][macos-btrfs]]
73  - maybe we can help out with the VFS/mount support
74 *** on-disk format
75 - [[https://btrfs.readthedocs.io/en/latest/dev/On-disk-format.html][on-disk-format]]
76 - 'btrfs consists entirely of several trees. the trees use copy-on-write.'
77 - trees are stored in nodes which belong to a level in the b-tree structure.
78 - internal nodes (inodes) contain refs to other inodes on the /next/ level OR
79  - to leaf nodes then the level reaches 0.
80 - leaf nodes contain various types depending on the tree.
81 - basic structures
82  - 0:8 uint = objectid, each tree has its own set of object IDs
83  - 8:1 uint = item type
84  - 9:8 uint = offset, depends on type.
85  - little-endian
86  - fields are unsigned
87  - *superblock*
88  - primary superblock is located at 0x10000 (64KiB)
89  - Mirror copies of the superblock are located at physical addresses 0x4000000 (64
90  MiB) and 0x4000000000 (256GiB), if valid. copies are updated simultaneously.
91  - during mount only the first super block at 0x10000 is read, error causes mount to
92  fail.
93  - BTRFS onls recognizes disks with a valid 0x10000 superblock.
94  - *header*
95  - stored at the start of every inode
96  - data following it depends on whether it is an internal or leaf node.
97  - *inode*
98  - node header followed by a number of key pointers
99  - 0:11 key
100  - 11:8 uint = block number
101  - 19:8 uint = generation
102  - *lnode*
103  - leaf nodes contain header followed by key pointers
104  - 0:11 key
105  - 11:4 uint = data offset relative to end of header(65)
106  - 15:4 uint = data size
107 - objects
108  - ROOT_TREE
109  - holds ROOT_ITEMs, ROOT_REFs, and ROOT_BACKREFs for every tree other than itself.
110  - used to find the other trees and to determine the subvol structure.
111  - holds items for the 'root tree directory'. laddr is store in the superblock
112  - objectIDs
113  - free ids: BTRFS_FIRST_FREE_OBJECTID=256ULL:BTRFS_LAST_FREE_OBJECTID=-256ULL
114  - otherwise used for internal use
115 *** send-stream format
116 - [[https://btrfs.readthedocs.io/en/latest/dev/dev-send-stream.html][send stream format]]
117 - Send stream format represents a linear sequence of commands describing actions to be
118  performed on the target filesystem (receive side), created on the source filesystem
119  (send side).
120 - The stream is currently used in two ways: to generate a stream representing a
121  standalone subvolume (full mode) or a difference between two snapshots of the same
122  subvolume (incremental mode).
123 - The stream can be generated using a set of other subvolumes to look for extent
124  references that could lead to a more efficient stream by transferring only the
125  references and not full data.
126 - The stream format is abstracted from on-disk structures (though it may share some
127  BTRFS specifics), the stream instructions could be generated by other means than the
128  send ioctl.
129 - it's a checksum+TLV
130 - header: u32len,u16cmd,u32crc32c
131 - data: type,length,raw data
132 - the v2 protocol supports the encoded commands
133 - the commands are kinda clunky - need to MKFIL/MKDIR then RENAM to create
134 *** [2023-08-09 Wed] ioctls
135 - magic#: 0x94
136  - https://docs.kernel.org/userspace-api/ioctl/ioctl-number.html
137  - Btrfs filesystem some lifted to vfs/generic
138  - fs/btrfs/ioctl.h and linux/fs.h
139 ** ZFS
140 -- [cite/t/f:@zfs]
141 
142 - core component of TrueNAS software
143 ** TMPFS
144 -- [cite/t/f:@tmpfs]
145 - in-mem FS
146 ** EXT4
147 -- [cite/t/f:@ext4]
148 ** XFS
149 -- [cite/t/f:@xfs]
150 -- [cite/t/f:@xfs-scalability]
151 * Storage Mediums
152 ** HDD
153 -- [cite/t/f:@hd-failure-ml]
154 ** SSD
155 -- [cite/t/f:@smart-ssd-qp]
156 -- [cite/t/f:@ssd-perf-opt]
157 
158 ** Flash
159 -- [cite/t/f:@flash-openssd-systems]
160 ** NVMe
161 -- [cite/t/f:@nvme-ssd-ux]
162 -- [[https://nvmexpress.org/specifications/][specifications]]
163 *** ZNS
164 -- [cite/t/f:@zns-usenix]
165 #+begin_quote
166 Zoned Storage is an open source, standards-based initiative to enable data centers to
167 scale efficiently for the zettabyte storage capacity era. There are two technologies
168 behind Zoned Storage, Shingled Magnetic Recording (SMR) in ATA/SCSI HDDs and Zoned
169 Namespaces (ZNS) in NVMe SSDs.
170 #+end_quote
171 -- [[https://zonedstorage.io/][zonedstorage.io]]
172 -- $465 8tb 2.5"? [[https://www.serversupply.com/SSD/PCI-E/7.68TB/WESTERN%20DIGITAL/WUS4BB076D7P3E3_332270.htm][retail]]
173 ** eMMC
174 -- [cite/t/f:@emmc-mobile-io]
175 * Linux
176 ** syscalls
177 *** ioctl
178 - [[https://elixir.bootlin.com/linux/latest/source/Documentation/userspace-api/ioctl/ioctl-number.rst][ioctl-numbers]]
179 * Rust
180 ** crates
181 *** nix
182 - [[https://crates.io/crates/nix][crates.io]]
183 *** memmap2
184 - [[https://crates.io/crates/memmap2][crates.io]]
185 *** zstd
186 - [[https://crates.io/crates/zstd][crates.io]]
187 *** rocksdb
188 - [[https://crates.io/crates/rocksdb][crates.io]]
189 *** tokio :tokio:
190 - [[https://crates.io/crates/tokio][crates.io]]
191 *** tracing :tokio:
192 - [[https://crates.io/crates/tracing][crates.io]]
193 **** tracing-subscriber
194 - [[https://crates.io/crates/tracing-subscriber][crates.io]]
195 *** axum :tokio:
196 - [[https://crates.io/crates/axum][crates.io]]
197 *** tower :tokio:
198 - [[https://crates.io/crates/tower][crates.io]]
199 *** uuid
200 - [[https://crates.io/crates/uuid][crates.io]]
201 ** unstable
202 *** lazy_cell
203 - [[https://github.com/rust-lang/rust/issues/109736][tracking-issue]]
204 *** {BTreeMap,BTreeSet}::extract_if
205 - [[https://github.com/rust-lang/rust/issues/70530][tracking-issue]]
206 * Lisp
207 ** ASDF
208 - [[https://gitlab.common-lisp.net/asdf/asdf][gitlab.common-lisp.net]]
209 - [[https://asdf.common-lisp.dev/][common-lisp.dev]]
210 - [[https://github.com/fare/asdf/blob/master/doc/best_practices.md][best-practices]]
211 - includes UIOP
212 ** Reference Projects
213 *** StumpWM
214 - [[https://github.com/stumpwm/stumpwm][github]]
215 *** Nyxt
216 - [[https://github.com/atlas-engineer/nyxt][github]]
217 *** Kons-9
218 - [[https://github.com/kaveh808/kons-9][github]]
219 *** cl-torrents
220 - [[https://github.com/vindarel/cl-torrents][github]]
221 *** Mezzano
222 - [[https://github.com/froggey/Mezzano][github]]
223 *** yalo
224 - [[https://github.com/whily/yalo][github]]
225 *** cl-ledger
226 - [[https://github.com/ledger/cl-ledger][github]]
227 *** Lem
228 - [[https://github.com/lem-project/lem][github]]
229 *** kindista
230 - [[https://github.com/kindista/kindista][github]]
231 *** lisp-chat
232 - [[https://github.com/ryukinix/lisp-chat][github]]
233 * Refs
234 #+print_bibliography: