changelog shortlog graph tags branches changeset file revisions annotate raw help

Mercurial > org > docs / nas-t/notes.org

revision 3: bd85a72319d8
child 5: bb51c61e4d4b
     1.1--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2+++ b/nas-t/notes.org	Sun Nov 05 22:46:15 2023 -0500
     1.3@@ -0,0 +1,235 @@
     1.4+#+TITLE: notes
     1.5+#+BIBLIOGRAPHY: refs.bib
     1.6+* File Systems
     1.7+** BTRFS
     1.8+#+begin_quote
     1.9+BTRFS is a Linux filesystem based on copy-on-write, allowing for
    1.10+efficient snapshots and clones.
    1.11+
    1.12+It uses B-trees as its main on-disk data structure. The design goal is
    1.13+to work well for many use cases and workloads. To this end, much
    1.14+effort has been directed to maintaining even performance as the
    1.15+filesystem ages, rather than trying to support a particular narrow
    1.16+benchmark use-case.
    1.17+
    1.18+Linux filesystems are installed on smartphones as well as enterprise
    1.19+servers. This entails challenges on many different fronts.
    1.20+
    1.21+- Scalability :: The filesystem must scale in many dimensions: disk
    1.22+  space, memory, and CPUs.
    1.23+
    1.24+- Data integrity :: Losing data is not an option, and much effort is
    1.25+  expended to safeguard the content. This includes checksums, metadata
    1.26+  duplication, and RAID support built into the filesystem.
    1.27+
    1.28+- Disk diversity :: The system should work well with SSDs and hard
    1.29+  disks. It is also expected to be able to use an array of different
    1.30+  sized disks, which poses challenges to the RAID and striping
    1.31+  mechanisms.
    1.32+#+end_quote
    1.33+-- [cite/t/f:@btrfs]
    1.34+*** [2023-08-08 Tue] btrfs performance speculation ::
    1.35+  - [[https://www.percona.com/blog/taking-a-look-at-btrfs-for-mysql/]]
    1.36+    - zfs outperforms immensely, but potential misconfiguration on btrfs side (virt+cow
    1.37+      still enabled?)
    1.38+  - https://www.ctrl.blog/entry/btrfs-vs-ext4-performance.html
    1.39+    - see the follow up comment on this post
    1.40+      - https://www.reddit.com/r/archlinux/comments/o2gc42/is_the_performance_hit_of_btrfs_serious_is_it/
    1.41+            #+begin_quote
    1.42+      I’m the author of OP’s first link. I use BtrFS today. I often shift lots of
    1.43+      de-duplicatable data around, and benefit greatly from file cloning. The data is actually
    1.44+      the same data that caused the slow performance in the article. BtrFS and file cloning
    1.45+      now performs this task quicker than a traditional file system. (Hm. It’s time for a
    1.46+      follow-up article.)
    1.47+
    1.48+      In a laptop with one drive: it doesn’t matter too much unless you do work that benefit
    1.49+      from file cloning or snapshots. This will likely require you to adjust your tooling and
    1.50+      workflow. I’ve had to rewrite the software I use every day to make it take advantage of
    1.51+      the capabilities of a more modern file system. You won’t benefit much from the data
    1.52+      recovery and redundancy features unless you’ve got two storage drives in your laptop and
    1.53+      can setup redundant data copies.
    1.54+
    1.55+          on similar hardware to mine?
    1.56+
    1.57+      It’s not a question about your hardware as much as how you use it. The bad performance I
    1.58+      documented was related to lots and lots of simultaneous random reads and writes. This
    1.59+      might not be representative of how you use your computer.
    1.60+            #+end_quote
    1.61+  - https://dl.acm.org/doi/fullHtml/10.1145/3386362
    1.62+    - this is about distributed file systems (in this case Ceph) - they argue against
    1.63+      basing DFS on ondisk-format filesystems (XFS ext4) - developed BlueStore as
    1.64+      backend, which runs directly on raw storage hardware.
    1.65+    - this is a good approach, but expensive (2 years in development) and risky
    1.66+    - better approach is to take advantage of a powerful enough existing ondisk-FS
    1.67+      format and pair it with supporting modules which abstract away the 'distributed'
    1.68+      mechanics.
    1.69+    - the strategy presented here is critical for enterprise-grade hardware where the
    1.70+      ondisk filesystem becomes the bottleneck that you're looking to optimize
    1.71+  - https://lore.kernel.org/lkml/cover.1676908729.git.dsterba@suse.com/
    1.72+    - linux 6.3 patch by David Sterba [2023-02-20 Mon]
    1.73+    - btrfs continues to show improvements in the linux kernel, ironing out the kinks
    1.74+    - makes it hard to compare benchmarks tho :/
    1.75+*** MacOS support
    1.76+- see this WIP k-ext for macos: [[https://github.com/relalis/macos-btrfs][macos-btrfs]]
    1.77+  - maybe we can help out with the VFS/mount support
    1.78+*** on-disk format
    1.79+- [[https://btrfs.readthedocs.io/en/latest/dev/On-disk-format.html][on-disk-format]]
    1.80+- 'btrfs consists entirely of several trees. the trees use copy-on-write.'
    1.81+- trees are stored in nodes which belong to a level in the b-tree structure.
    1.82+- internal nodes (inodes) contain refs to other inodes on the /next/ level OR
    1.83+  - to leaf nodes then the level reaches 0.
    1.84+- leaf nodes contain various types depending on the tree.
    1.85+- basic structures
    1.86+  - 0:8 uint = objectid, each tree has its own set of object IDs
    1.87+  - 8:1 uint = item type
    1.88+  - 9:8 uint = offset, depends on type.
    1.89+  - little-endian
    1.90+  - fields are unsigned
    1.91+  - *superblock*
    1.92+    - primary superblock is located at 0x10000 (64KiB)
    1.93+    - Mirror copies of the superblock are located at physical addresses 0x4000000 (64
    1.94+      MiB) and 0x4000000000 (256GiB), if valid. copies are updated simultaneously.
    1.95+    - during mount only the first super block at 0x10000 is read, error causes mount to
    1.96+      fail.
    1.97+    - BTRFS onls recognizes disks with a valid 0x10000 superblock.
    1.98+  - *header*
    1.99+    - stored at the start of every inode
   1.100+    - data following it depends on whether it is an internal or leaf node.
   1.101+  - *inode*
   1.102+    - node header followed by a number of key pointers
   1.103+    - 0:11 key
   1.104+    - 11:8 uint = block number
   1.105+    - 19:8 uint = generation
   1.106+  - *lnode*
   1.107+    - leaf nodes contain header followed by key pointers
   1.108+    - 0:11 key
   1.109+    - 11:4 uint = data offset relative to end of header(65)
   1.110+    - 15:4 uint = data size
   1.111+- objects
   1.112+  - ROOT_TREE
   1.113+    - holds ROOT_ITEMs, ROOT_REFs, and ROOT_BACKREFs for every tree other than itself.
   1.114+    - used to find the other trees and to determine the subvol structure.
   1.115+    - holds items for the 'root tree directory'. laddr is store in the superblock
   1.116+  - objectIDs
   1.117+    - free ids: BTRFS_FIRST_FREE_OBJECTID=256ULL:BTRFS_LAST_FREE_OBJECTID=-256ULL
   1.118+    - otherwise used for internal use
   1.119+*** send-stream format
   1.120+- [[https://btrfs.readthedocs.io/en/latest/dev/dev-send-stream.html][send stream format]]
   1.121+- Send stream format represents a linear sequence of commands describing actions to be
   1.122+  performed on the target filesystem (receive side), created on the source filesystem
   1.123+  (send side).
   1.124+- The stream is currently used in two ways: to generate a stream representing a
   1.125+  standalone subvolume (full mode) or a difference between two snapshots of the same
   1.126+  subvolume (incremental mode).
   1.127+- The stream can be generated using a set of other subvolumes to look for extent
   1.128+  references that could lead to a more efficient stream by transferring only the
   1.129+  references and not full data.
   1.130+- The stream format is abstracted from on-disk structures (though it may share some
   1.131+  BTRFS specifics), the stream instructions could be generated by other means than the
   1.132+  send ioctl.
   1.133+- it's a checksum+TLV
   1.134+- header: u32len,u16cmd,u32crc32c
   1.135+- data: type,length,raw data
   1.136+- the v2 protocol supports the encoded commands
   1.137+- the commands are kinda clunky - need to MKFIL/MKDIR then RENAM to create
   1.138+*** [2023-08-09 Wed] ioctls
   1.139+- magic#: 0x94 
   1.140+  - https://docs.kernel.org/userspace-api/ioctl/ioctl-number.html
   1.141+  - Btrfs filesystem some lifted to vfs/generic
   1.142+  - fs/btrfs/ioctl.h and linux/fs.h
   1.143+** ZFS
   1.144+-- [cite/t/f:@zfs]
   1.145+
   1.146+- core component of TrueNAS software
   1.147+** TMPFS
   1.148+-- [cite/t/f:@tmpfs]
   1.149+- in-mem FS
   1.150+** EXT4
   1.151+-- [cite/t/f:@ext4]
   1.152+** XFS
   1.153+-- [cite/t/f:@xfs]
   1.154+-- [cite/t/f:@xfs-scalability]
   1.155+* Storage Mediums
   1.156+** HDD
   1.157+-- [cite/t/f:@hd-failure-ml]
   1.158+** SSD
   1.159+-- [cite/t/f:@smart-ssd-qp]
   1.160+-- [cite/t/f:@ssd-perf-opt]
   1.161+
   1.162+** Flash
   1.163+-- [cite/t/f:@flash-openssd-systems]
   1.164+** NVMe
   1.165+-- [cite/t/f:@nvme-ssd-ux]
   1.166+-- [[https://nvmexpress.org/specifications/][specifications]]
   1.167+*** ZNS
   1.168+-- [cite/t/f:@zns-usenix]
   1.169+#+begin_quote
   1.170+Zoned Storage is an open source, standards-based initiative to enable data centers to
   1.171+scale efficiently for the zettabyte storage capacity era. There are two technologies
   1.172+behind Zoned Storage, Shingled Magnetic Recording (SMR) in ATA/SCSI HDDs and Zoned
   1.173+Namespaces (ZNS) in NVMe SSDs.
   1.174+#+end_quote
   1.175+-- [[https://zonedstorage.io/][zonedstorage.io]]
   1.176+-- $465 8tb 2.5"? [[https://www.serversupply.com/SSD/PCI-E/7.68TB/WESTERN%20DIGITAL/WUS4BB076D7P3E3_332270.htm][retail]]
   1.177+** eMMC
   1.178+-- [cite/t/f:@emmc-mobile-io]
   1.179+* Linux
   1.180+** syscalls
   1.181+*** ioctl
   1.182+- [[https://elixir.bootlin.com/linux/latest/source/Documentation/userspace-api/ioctl/ioctl-number.rst][ioctl-numbers]]
   1.183+* Rust
   1.184+** crates
   1.185+*** nix
   1.186+- [[https://crates.io/crates/nix][crates.io]]
   1.187+*** memmap2
   1.188+- [[https://crates.io/crates/memmap2][crates.io]]
   1.189+*** zstd
   1.190+- [[https://crates.io/crates/zstd][crates.io]]
   1.191+*** rocksdb
   1.192+- [[https://crates.io/crates/rocksdb][crates.io]]
   1.193+*** tokio                                                           :tokio:
   1.194+- [[https://crates.io/crates/tokio][crates.io]]
   1.195+*** tracing                                                         :tokio:
   1.196+- [[https://crates.io/crates/tracing][crates.io]]
   1.197+**** tracing-subscriber
   1.198+- [[https://crates.io/crates/tracing-subscriber][crates.io]]
   1.199+*** axum                                                            :tokio:
   1.200+- [[https://crates.io/crates/axum][crates.io]]
   1.201+*** tower                                                           :tokio:
   1.202+- [[https://crates.io/crates/tower][crates.io]]
   1.203+*** uuid
   1.204+- [[https://crates.io/crates/uuid][crates.io]]
   1.205+** unstable
   1.206+*** lazy_cell
   1.207+- [[https://github.com/rust-lang/rust/issues/109736][tracking-issue]]
   1.208+*** {BTreeMap,BTreeSet}::extract_if
   1.209+- [[https://github.com/rust-lang/rust/issues/70530][tracking-issue]]
   1.210+* Lisp
   1.211+** ASDF
   1.212+- [[https://gitlab.common-lisp.net/asdf/asdf][gitlab.common-lisp.net]]
   1.213+- [[https://asdf.common-lisp.dev/][common-lisp.dev]]
   1.214+- [[https://github.com/fare/asdf/blob/master/doc/best_practices.md][best-practices]]
   1.215+- includes UIOP
   1.216+** Reference Projects
   1.217+*** StumpWM
   1.218+- [[https://github.com/stumpwm/stumpwm][github]]
   1.219+*** Nyxt
   1.220+- [[https://github.com/atlas-engineer/nyxt][github]]
   1.221+*** Kons-9
   1.222+- [[https://github.com/kaveh808/kons-9][github]]
   1.223+*** cl-torrents
   1.224+- [[https://github.com/vindarel/cl-torrents][github]]
   1.225+*** Mezzano
   1.226+- [[https://github.com/froggey/Mezzano][github]]
   1.227+*** yalo
   1.228+- [[https://github.com/whily/yalo][github]]
   1.229+*** cl-ledger
   1.230+- [[https://github.com/ledger/cl-ledger][github]]
   1.231+*** Lem
   1.232+- [[https://github.com/lem-project/lem][github]]
   1.233+*** kindista
   1.234+- [[https://github.com/kindista/kindista][github]]
   1.235+*** lisp-chat
   1.236+- [[https://github.com/ryukinix/lisp-chat][github]]
   1.237+* Refs
   1.238+#+print_bibliography: