changeset 7: |
d543f73892d3 |
parent 6: |
008f9709e728 |
child 8: |
6ac37a61456a |
author: |
Richard Westhaver <ellis@rwest.io> |
date: |
Sat, 22 Jun 2024 23:56:08 -0400 |
files: |
nas-t.org |
description: |
add nas-t notes |
1.1--- /dev/null Thu Jan 01 00:00:00 1970 +0000
1.2+++ b/nas-t.org Sat Jun 22 23:56:08 2024 -0400
1.3@@ -0,0 +1,234 @@
1.4+#+BIBLIOGRAPHY: refs.bib
1.5+* File Systems
1.6+** BTRFS
1.7+#+begin_quote
1.8+BTRFS is a Linux filesystem based on copy-on-write, allowing for
1.9+efficient snapshots and clones.
1.10+
1.11+It uses B-trees as its main on-disk data structure. The design goal is
1.12+to work well for many use cases and workloads. To this end, much
1.13+effort has been directed to maintaining even performance as the
1.14+filesystem ages, rather than trying to support a particular narrow
1.15+benchmark use-case.
1.16+
1.17+Linux filesystems are installed on smartphones as well as enterprise
1.18+servers. This entails challenges on many different fronts.
1.19+
1.20+- Scalability :: The filesystem must scale in many dimensions: disk
1.21+ space, memory, and CPUs.
1.22+
1.23+- Data integrity :: Losing data is not an option, and much effort is
1.24+ expended to safeguard the content. This includes checksums, metadata
1.25+ duplication, and RAID support built into the filesystem.
1.26+
1.27+- Disk diversity :: The system should work well with SSDs and hard
1.28+ disks. It is also expected to be able to use an array of different
1.29+ sized disks, which poses challenges to the RAID and striping
1.30+ mechanisms.
1.31+#+end_quote
1.32+-- [cite/t/f:@btrfs]
1.33+*** [2023-08-08 Tue] btrfs performance speculation ::
1.34+ - [[https://www.percona.com/blog/taking-a-look-at-btrfs-for-mysql/]]
1.35+ - zfs outperforms immensely, but potential misconfiguration on btrfs side (virt+cow
1.36+ still enabled?)
1.37+ - https://www.ctrl.blog/entry/btrfs-vs-ext4-performance.html
1.38+ - see the follow up comment on this post
1.39+ - https://www.reddit.com/r/archlinux/comments/o2gc42/is_the_performance_hit_of_btrfs_serious_is_it/
1.40+ #+begin_quote
1.41+ I’m the author of OP’s first link. I use BtrFS today. I often shift lots of
1.42+ de-duplicatable data around, and benefit greatly from file cloning. The data is actually
1.43+ the same data that caused the slow performance in the article. BtrFS and file cloning
1.44+ now performs this task quicker than a traditional file system. (Hm. It’s time for a
1.45+ follow-up article.)
1.46+
1.47+ In a laptop with one drive: it doesn’t matter too much unless you do work that benefit
1.48+ from file cloning or snapshots. This will likely require you to adjust your tooling and
1.49+ workflow. I’ve had to rewrite the software I use every day to make it take advantage of
1.50+ the capabilities of a more modern file system. You won’t benefit much from the data
1.51+ recovery and redundancy features unless you’ve got two storage drives in your laptop and
1.52+ can setup redundant data copies.
1.53+
1.54+ on similar hardware to mine?
1.55+
1.56+ It’s not a question about your hardware as much as how you use it. The bad performance I
1.57+ documented was related to lots and lots of simultaneous random reads and writes. This
1.58+ might not be representative of how you use your computer.
1.59+ #+end_quote
1.60+ - https://dl.acm.org/doi/fullHtml/10.1145/3386362
1.61+ - this is about distributed file systems (in this case Ceph) - they argue against
1.62+ basing DFS on ondisk-format filesystems (XFS ext4) - developed BlueStore as
1.63+ backend, which runs directly on raw storage hardware.
1.64+ - this is a good approach, but expensive (2 years in development) and risky
1.65+ - better approach is to take advantage of a powerful enough existing ondisk-FS
1.66+ format and pair it with supporting modules which abstract away the 'distributed'
1.67+ mechanics.
1.68+ - the strategy presented here is critical for enterprise-grade hardware where the
1.69+ ondisk filesystem becomes the bottleneck that you're looking to optimize
1.70+ - https://lore.kernel.org/lkml/cover.1676908729.git.dsterba@suse.com/
1.71+ - linux 6.3 patch by David Sterba [2023-02-20 Mon]
1.72+ - btrfs continues to show improvements in the linux kernel, ironing out the kinks
1.73+ - makes it hard to compare benchmarks tho :/
1.74+*** MacOS support
1.75+- see this WIP k-ext for macos: [[https://github.com/relalis/macos-btrfs][macos-btrfs]]
1.76+ - maybe we can help out with the VFS/mount support
1.77+*** on-disk format
1.78+- [[https://btrfs.readthedocs.io/en/latest/dev/On-disk-format.html][on-disk-format]]
1.79+- 'btrfs consists entirely of several trees. the trees use copy-on-write.'
1.80+- trees are stored in nodes which belong to a level in the b-tree structure.
1.81+- internal nodes (inodes) contain refs to other inodes on the /next/ level OR
1.82+ - to leaf nodes then the level reaches 0.
1.83+- leaf nodes contain various types depending on the tree.
1.84+- basic structures
1.85+ - 0:8 uint = objectid, each tree has its own set of object IDs
1.86+ - 8:1 uint = item type
1.87+ - 9:8 uint = offset, depends on type.
1.88+ - little-endian
1.89+ - fields are unsigned
1.90+ - *superblock*
1.91+ - primary superblock is located at 0x10000 (64KiB)
1.92+ - Mirror copies of the superblock are located at physical addresses 0x4000000 (64
1.93+ MiB) and 0x4000000000 (256GiB), if valid. copies are updated simultaneously.
1.94+ - during mount only the first super block at 0x10000 is read, error causes mount to
1.95+ fail.
1.96+ - BTRFS onls recognizes disks with a valid 0x10000 superblock.
1.97+ - *header*
1.98+ - stored at the start of every inode
1.99+ - data following it depends on whether it is an internal or leaf node.
1.100+ - *inode*
1.101+ - node header followed by a number of key pointers
1.102+ - 0:11 key
1.103+ - 11:8 uint = block number
1.104+ - 19:8 uint = generation
1.105+ - *lnode*
1.106+ - leaf nodes contain header followed by key pointers
1.107+ - 0:11 key
1.108+ - 11:4 uint = data offset relative to end of header(65)
1.109+ - 15:4 uint = data size
1.110+- objects
1.111+ - ROOT_TREE
1.112+ - holds ROOT_ITEMs, ROOT_REFs, and ROOT_BACKREFs for every tree other than itself.
1.113+ - used to find the other trees and to determine the subvol structure.
1.114+ - holds items for the 'root tree directory'. laddr is store in the superblock
1.115+ - objectIDs
1.116+ - free ids: BTRFS_FIRST_FREE_OBJECTID=256ULL:BTRFS_LAST_FREE_OBJECTID=-256ULL
1.117+ - otherwise used for internal use
1.118+*** send-stream format
1.119+- [[https://btrfs.readthedocs.io/en/latest/dev/dev-send-stream.html][send stream format]]
1.120+- Send stream format represents a linear sequence of commands describing actions to be
1.121+ performed on the target filesystem (receive side), created on the source filesystem
1.122+ (send side).
1.123+- The stream is currently used in two ways: to generate a stream representing a
1.124+ standalone subvolume (full mode) or a difference between two snapshots of the same
1.125+ subvolume (incremental mode).
1.126+- The stream can be generated using a set of other subvolumes to look for extent
1.127+ references that could lead to a more efficient stream by transferring only the
1.128+ references and not full data.
1.129+- The stream format is abstracted from on-disk structures (though it may share some
1.130+ BTRFS specifics), the stream instructions could be generated by other means than the
1.131+ send ioctl.
1.132+- it's a checksum+TLV
1.133+- header: u32len,u16cmd,u32crc32c
1.134+- data: type,length,raw data
1.135+- the v2 protocol supports the encoded commands
1.136+- the commands are kinda clunky - need to MKFIL/MKDIR then RENAM to create
1.137+*** [2023-08-09 Wed] ioctls
1.138+- magic#: 0x94
1.139+ - https://docs.kernel.org/userspace-api/ioctl/ioctl-number.html
1.140+ - Btrfs filesystem some lifted to vfs/generic
1.141+ - fs/btrfs/ioctl.h and linux/fs.h
1.142+** ZFS
1.143+-- [cite/t/f:@zfs]
1.144+
1.145+- core component of TrueNAS software
1.146+** TMPFS
1.147+-- [cite/t/f:@tmpfs]
1.148+- in-mem FS
1.149+** EXT4
1.150+-- [cite/t/f:@ext4]
1.151+** XFS
1.152+-- [cite/t/f:@xfs]
1.153+-- [cite/t/f:@xfs-scalability]
1.154+* Storage Mediums
1.155+** HDD
1.156+-- [cite/t/f:@hd-failure-ml]
1.157+** SSD
1.158+-- [cite/t/f:@smart-ssd-qp]
1.159+-- [cite/t/f:@ssd-perf-opt]
1.160+
1.161+** Flash
1.162+-- [cite/t/f:@flash-openssd-systems]
1.163+** NVMe
1.164+-- [cite/t/f:@nvme-ssd-ux]
1.165+-- [[https://nvmexpress.org/specifications/][specifications]]
1.166+*** ZNS
1.167+-- [cite/t/f:@zns-usenix]
1.168+#+begin_quote
1.169+Zoned Storage is an open source, standards-based initiative to enable data centers to
1.170+scale efficiently for the zettabyte storage capacity era. There are two technologies
1.171+behind Zoned Storage, Shingled Magnetic Recording (SMR) in ATA/SCSI HDDs and Zoned
1.172+Namespaces (ZNS) in NVMe SSDs.
1.173+#+end_quote
1.174+-- [[https://zonedstorage.io/][zonedstorage.io]]
1.175+-- $465 8tb 2.5"? [[https://www.serversupply.com/SSD/PCI-E/7.68TB/WESTERN%20DIGITAL/WUS4BB076D7P3E3_332270.htm][retail]]
1.176+** eMMC
1.177+-- [cite/t/f:@emmc-mobile-io]
1.178+* Linux
1.179+** syscalls
1.180+*** ioctl
1.181+- [[https://elixir.bootlin.com/linux/latest/source/Documentation/userspace-api/ioctl/ioctl-number.rst][ioctl-numbers]]
1.182+* Rust
1.183+** crates
1.184+*** nix
1.185+- [[https://crates.io/crates/nix][crates.io]]
1.186+*** memmap2
1.187+- [[https://crates.io/crates/memmap2][crates.io]]
1.188+*** zstd
1.189+- [[https://crates.io/crates/zstd][crates.io]]
1.190+*** rocksdb
1.191+- [[https://crates.io/crates/rocksdb][crates.io]]
1.192+*** tokio :tokio:
1.193+- [[https://crates.io/crates/tokio][crates.io]]
1.194+*** tracing :tokio:
1.195+- [[https://crates.io/crates/tracing][crates.io]]
1.196+**** tracing-subscriber
1.197+- [[https://crates.io/crates/tracing-subscriber][crates.io]]
1.198+*** axum :tokio:
1.199+- [[https://crates.io/crates/axum][crates.io]]
1.200+*** tower :tokio:
1.201+- [[https://crates.io/crates/tower][crates.io]]
1.202+*** uuid
1.203+- [[https://crates.io/crates/uuid][crates.io]]
1.204+** unstable
1.205+*** lazy_cell
1.206+- [[https://github.com/rust-lang/rust/issues/109736][tracking-issue]]
1.207+*** {BTreeMap,BTreeSet}::extract_if
1.208+- [[https://github.com/rust-lang/rust/issues/70530][tracking-issue]]
1.209+* Lisp
1.210+** ASDF
1.211+- [[https://gitlab.common-lisp.net/asdf/asdf][gitlab.common-lisp.net]]
1.212+- [[https://asdf.common-lisp.dev/][common-lisp.dev]]
1.213+- [[https://github.com/fare/asdf/blob/master/doc/best_practices.md][best-practices]]
1.214+- includes UIOP
1.215+** Reference Projects
1.216+*** StumpWM
1.217+- [[https://github.com/stumpwm/stumpwm][github]]
1.218+*** Nyxt
1.219+- [[https://github.com/atlas-engineer/nyxt][github]]
1.220+*** Kons-9
1.221+- [[https://github.com/kaveh808/kons-9][github]]
1.222+*** cl-torrents
1.223+- [[https://github.com/vindarel/cl-torrents][github]]
1.224+*** Mezzano
1.225+- [[https://github.com/froggey/Mezzano][github]]
1.226+*** yalo
1.227+- [[https://github.com/whily/yalo][github]]
1.228+*** cl-ledger
1.229+- [[https://github.com/ledger/cl-ledger][github]]
1.230+*** Lem
1.231+- [[https://github.com/lem-project/lem][github]]
1.232+*** kindista
1.233+- [[https://github.com/kindista/kindista][github]]
1.234+*** lisp-chat
1.235+- [[https://github.com/ryukinix/lisp-chat][github]]
1.236+* Refs
1.237+#+print_bibliography: