changelog shortlog graph tags branches files raw help

Mercurial > org > notes / changeset: add nas-t notes

changeset 7: d543f73892d3
parent 6: 008f9709e728
child 8: 6ac37a61456a
author: Richard Westhaver <ellis@rwest.io>
date: Sat, 22 Jun 2024 23:56:08 -0400
files: nas-t.org
description: add nas-t notes
     1.1--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2+++ b/nas-t.org	Sat Jun 22 23:56:08 2024 -0400
     1.3@@ -0,0 +1,234 @@
     1.4+#+BIBLIOGRAPHY: refs.bib
     1.5+* File Systems
     1.6+** BTRFS
     1.7+#+begin_quote
     1.8+BTRFS is a Linux filesystem based on copy-on-write, allowing for
     1.9+efficient snapshots and clones.
    1.10+
    1.11+It uses B-trees as its main on-disk data structure. The design goal is
    1.12+to work well for many use cases and workloads. To this end, much
    1.13+effort has been directed to maintaining even performance as the
    1.14+filesystem ages, rather than trying to support a particular narrow
    1.15+benchmark use-case.
    1.16+
    1.17+Linux filesystems are installed on smartphones as well as enterprise
    1.18+servers. This entails challenges on many different fronts.
    1.19+
    1.20+- Scalability :: The filesystem must scale in many dimensions: disk
    1.21+  space, memory, and CPUs.
    1.22+
    1.23+- Data integrity :: Losing data is not an option, and much effort is
    1.24+  expended to safeguard the content. This includes checksums, metadata
    1.25+  duplication, and RAID support built into the filesystem.
    1.26+
    1.27+- Disk diversity :: The system should work well with SSDs and hard
    1.28+  disks. It is also expected to be able to use an array of different
    1.29+  sized disks, which poses challenges to the RAID and striping
    1.30+  mechanisms.
    1.31+#+end_quote
    1.32+-- [cite/t/f:@btrfs]
    1.33+*** [2023-08-08 Tue] btrfs performance speculation ::
    1.34+  - [[https://www.percona.com/blog/taking-a-look-at-btrfs-for-mysql/]]
    1.35+    - zfs outperforms immensely, but potential misconfiguration on btrfs side (virt+cow
    1.36+      still enabled?)
    1.37+  - https://www.ctrl.blog/entry/btrfs-vs-ext4-performance.html
    1.38+    - see the follow up comment on this post
    1.39+      - https://www.reddit.com/r/archlinux/comments/o2gc42/is_the_performance_hit_of_btrfs_serious_is_it/
    1.40+            #+begin_quote
    1.41+      I’m the author of OP’s first link. I use BtrFS today. I often shift lots of
    1.42+      de-duplicatable data around, and benefit greatly from file cloning. The data is actually
    1.43+      the same data that caused the slow performance in the article. BtrFS and file cloning
    1.44+      now performs this task quicker than a traditional file system. (Hm. It’s time for a
    1.45+      follow-up article.)
    1.46+
    1.47+      In a laptop with one drive: it doesn’t matter too much unless you do work that benefit
    1.48+      from file cloning or snapshots. This will likely require you to adjust your tooling and
    1.49+      workflow. I’ve had to rewrite the software I use every day to make it take advantage of
    1.50+      the capabilities of a more modern file system. You won’t benefit much from the data
    1.51+      recovery and redundancy features unless you’ve got two storage drives in your laptop and
    1.52+      can setup redundant data copies.
    1.53+
    1.54+          on similar hardware to mine?
    1.55+
    1.56+      It’s not a question about your hardware as much as how you use it. The bad performance I
    1.57+      documented was related to lots and lots of simultaneous random reads and writes. This
    1.58+      might not be representative of how you use your computer.
    1.59+            #+end_quote
    1.60+  - https://dl.acm.org/doi/fullHtml/10.1145/3386362
    1.61+    - this is about distributed file systems (in this case Ceph) - they argue against
    1.62+      basing DFS on ondisk-format filesystems (XFS ext4) - developed BlueStore as
    1.63+      backend, which runs directly on raw storage hardware.
    1.64+    - this is a good approach, but expensive (2 years in development) and risky
    1.65+    - better approach is to take advantage of a powerful enough existing ondisk-FS
    1.66+      format and pair it with supporting modules which abstract away the 'distributed'
    1.67+      mechanics.
    1.68+    - the strategy presented here is critical for enterprise-grade hardware where the
    1.69+      ondisk filesystem becomes the bottleneck that you're looking to optimize
    1.70+  - https://lore.kernel.org/lkml/cover.1676908729.git.dsterba@suse.com/
    1.71+    - linux 6.3 patch by David Sterba [2023-02-20 Mon]
    1.72+    - btrfs continues to show improvements in the linux kernel, ironing out the kinks
    1.73+    - makes it hard to compare benchmarks tho :/
    1.74+*** MacOS support
    1.75+- see this WIP k-ext for macos: [[https://github.com/relalis/macos-btrfs][macos-btrfs]]
    1.76+  - maybe we can help out with the VFS/mount support
    1.77+*** on-disk format
    1.78+- [[https://btrfs.readthedocs.io/en/latest/dev/On-disk-format.html][on-disk-format]]
    1.79+- 'btrfs consists entirely of several trees. the trees use copy-on-write.'
    1.80+- trees are stored in nodes which belong to a level in the b-tree structure.
    1.81+- internal nodes (inodes) contain refs to other inodes on the /next/ level OR
    1.82+  - to leaf nodes then the level reaches 0.
    1.83+- leaf nodes contain various types depending on the tree.
    1.84+- basic structures
    1.85+  - 0:8 uint = objectid, each tree has its own set of object IDs
    1.86+  - 8:1 uint = item type
    1.87+  - 9:8 uint = offset, depends on type.
    1.88+  - little-endian
    1.89+  - fields are unsigned
    1.90+  - *superblock*
    1.91+    - primary superblock is located at 0x10000 (64KiB)
    1.92+    - Mirror copies of the superblock are located at physical addresses 0x4000000 (64
    1.93+      MiB) and 0x4000000000 (256GiB), if valid. copies are updated simultaneously.
    1.94+    - during mount only the first super block at 0x10000 is read, error causes mount to
    1.95+      fail.
    1.96+    - BTRFS onls recognizes disks with a valid 0x10000 superblock.
    1.97+  - *header*
    1.98+    - stored at the start of every inode
    1.99+    - data following it depends on whether it is an internal or leaf node.
   1.100+  - *inode*
   1.101+    - node header followed by a number of key pointers
   1.102+    - 0:11 key
   1.103+    - 11:8 uint = block number
   1.104+    - 19:8 uint = generation
   1.105+  - *lnode*
   1.106+    - leaf nodes contain header followed by key pointers
   1.107+    - 0:11 key
   1.108+    - 11:4 uint = data offset relative to end of header(65)
   1.109+    - 15:4 uint = data size
   1.110+- objects
   1.111+  - ROOT_TREE
   1.112+    - holds ROOT_ITEMs, ROOT_REFs, and ROOT_BACKREFs for every tree other than itself.
   1.113+    - used to find the other trees and to determine the subvol structure.
   1.114+    - holds items for the 'root tree directory'. laddr is store in the superblock
   1.115+  - objectIDs
   1.116+    - free ids: BTRFS_FIRST_FREE_OBJECTID=256ULL:BTRFS_LAST_FREE_OBJECTID=-256ULL
   1.117+    - otherwise used for internal use
   1.118+*** send-stream format
   1.119+- [[https://btrfs.readthedocs.io/en/latest/dev/dev-send-stream.html][send stream format]]
   1.120+- Send stream format represents a linear sequence of commands describing actions to be
   1.121+  performed on the target filesystem (receive side), created on the source filesystem
   1.122+  (send side).
   1.123+- The stream is currently used in two ways: to generate a stream representing a
   1.124+  standalone subvolume (full mode) or a difference between two snapshots of the same
   1.125+  subvolume (incremental mode).
   1.126+- The stream can be generated using a set of other subvolumes to look for extent
   1.127+  references that could lead to a more efficient stream by transferring only the
   1.128+  references and not full data.
   1.129+- The stream format is abstracted from on-disk structures (though it may share some
   1.130+  BTRFS specifics), the stream instructions could be generated by other means than the
   1.131+  send ioctl.
   1.132+- it's a checksum+TLV
   1.133+- header: u32len,u16cmd,u32crc32c
   1.134+- data: type,length,raw data
   1.135+- the v2 protocol supports the encoded commands
   1.136+- the commands are kinda clunky - need to MKFIL/MKDIR then RENAM to create
   1.137+*** [2023-08-09 Wed] ioctls
   1.138+- magic#: 0x94 
   1.139+  - https://docs.kernel.org/userspace-api/ioctl/ioctl-number.html
   1.140+  - Btrfs filesystem some lifted to vfs/generic
   1.141+  - fs/btrfs/ioctl.h and linux/fs.h
   1.142+** ZFS
   1.143+-- [cite/t/f:@zfs]
   1.144+
   1.145+- core component of TrueNAS software
   1.146+** TMPFS
   1.147+-- [cite/t/f:@tmpfs]
   1.148+- in-mem FS
   1.149+** EXT4
   1.150+-- [cite/t/f:@ext4]
   1.151+** XFS
   1.152+-- [cite/t/f:@xfs]
   1.153+-- [cite/t/f:@xfs-scalability]
   1.154+* Storage Mediums
   1.155+** HDD
   1.156+-- [cite/t/f:@hd-failure-ml]
   1.157+** SSD
   1.158+-- [cite/t/f:@smart-ssd-qp]
   1.159+-- [cite/t/f:@ssd-perf-opt]
   1.160+
   1.161+** Flash
   1.162+-- [cite/t/f:@flash-openssd-systems]
   1.163+** NVMe
   1.164+-- [cite/t/f:@nvme-ssd-ux]
   1.165+-- [[https://nvmexpress.org/specifications/][specifications]]
   1.166+*** ZNS
   1.167+-- [cite/t/f:@zns-usenix]
   1.168+#+begin_quote
   1.169+Zoned Storage is an open source, standards-based initiative to enable data centers to
   1.170+scale efficiently for the zettabyte storage capacity era. There are two technologies
   1.171+behind Zoned Storage, Shingled Magnetic Recording (SMR) in ATA/SCSI HDDs and Zoned
   1.172+Namespaces (ZNS) in NVMe SSDs.
   1.173+#+end_quote
   1.174+-- [[https://zonedstorage.io/][zonedstorage.io]]
   1.175+-- $465 8tb 2.5"? [[https://www.serversupply.com/SSD/PCI-E/7.68TB/WESTERN%20DIGITAL/WUS4BB076D7P3E3_332270.htm][retail]]
   1.176+** eMMC
   1.177+-- [cite/t/f:@emmc-mobile-io]
   1.178+* Linux
   1.179+** syscalls
   1.180+*** ioctl
   1.181+- [[https://elixir.bootlin.com/linux/latest/source/Documentation/userspace-api/ioctl/ioctl-number.rst][ioctl-numbers]]
   1.182+* Rust
   1.183+** crates
   1.184+*** nix
   1.185+- [[https://crates.io/crates/nix][crates.io]]
   1.186+*** memmap2
   1.187+- [[https://crates.io/crates/memmap2][crates.io]]
   1.188+*** zstd
   1.189+- [[https://crates.io/crates/zstd][crates.io]]
   1.190+*** rocksdb
   1.191+- [[https://crates.io/crates/rocksdb][crates.io]]
   1.192+*** tokio                                                           :tokio:
   1.193+- [[https://crates.io/crates/tokio][crates.io]]
   1.194+*** tracing                                                         :tokio:
   1.195+- [[https://crates.io/crates/tracing][crates.io]]
   1.196+**** tracing-subscriber
   1.197+- [[https://crates.io/crates/tracing-subscriber][crates.io]]
   1.198+*** axum                                                            :tokio:
   1.199+- [[https://crates.io/crates/axum][crates.io]]
   1.200+*** tower                                                           :tokio:
   1.201+- [[https://crates.io/crates/tower][crates.io]]
   1.202+*** uuid
   1.203+- [[https://crates.io/crates/uuid][crates.io]]
   1.204+** unstable
   1.205+*** lazy_cell
   1.206+- [[https://github.com/rust-lang/rust/issues/109736][tracking-issue]]
   1.207+*** {BTreeMap,BTreeSet}::extract_if
   1.208+- [[https://github.com/rust-lang/rust/issues/70530][tracking-issue]]
   1.209+* Lisp
   1.210+** ASDF
   1.211+- [[https://gitlab.common-lisp.net/asdf/asdf][gitlab.common-lisp.net]]
   1.212+- [[https://asdf.common-lisp.dev/][common-lisp.dev]]
   1.213+- [[https://github.com/fare/asdf/blob/master/doc/best_practices.md][best-practices]]
   1.214+- includes UIOP
   1.215+** Reference Projects
   1.216+*** StumpWM
   1.217+- [[https://github.com/stumpwm/stumpwm][github]]
   1.218+*** Nyxt
   1.219+- [[https://github.com/atlas-engineer/nyxt][github]]
   1.220+*** Kons-9
   1.221+- [[https://github.com/kaveh808/kons-9][github]]
   1.222+*** cl-torrents
   1.223+- [[https://github.com/vindarel/cl-torrents][github]]
   1.224+*** Mezzano
   1.225+- [[https://github.com/froggey/Mezzano][github]]
   1.226+*** yalo
   1.227+- [[https://github.com/whily/yalo][github]]
   1.228+*** cl-ledger
   1.229+- [[https://github.com/ledger/cl-ledger][github]]
   1.230+*** Lem
   1.231+- [[https://github.com/lem-project/lem][github]]
   1.232+*** kindista
   1.233+- [[https://github.com/kindista/kindista][github]]
   1.234+*** lisp-chat
   1.235+- [[https://github.com/ryukinix/lisp-chat][github]]
   1.236+* Refs
   1.237+#+print_bibliography: