changeset 23: |
adc9ebf28ff0 |
parent: |
1204cefcfd28
|
child: |
82bb06cbf137 |
author: |
Richard Westhaver <ellis@rwest.io> |
date: |
Sun, 16 Jun 2024 01:43:41 -0400 |
permissions: |
-rw-r--r-- |
description: |
hacks |
2 #+date: [2024-03-11 Mon]
3 #+setupfile: ../../clean.theme
4 I recently picked up
[[https://dl.acm.org/doi/10.5555/2462741][Hacker's Delight]] and having a lot of fun with
5 it. It's a collection of bit-manipulation tricks collected by hackers
6 over many years. You can flip open pretty much anywhere in the book
7 and start learn something really cool.
9 There's something about seeing bit strings and assembly code in a book
10 that really catches my attention, this one goes a bit further by even
11 describing a complete RISC (Reduced Instruction Set Computer) we can
12 implement to play around with the various tricks.
14 As an exercise and for fun, I'd like to employ some Lisp-fu here and
15 implement a small VM specifically designed for mangling bits.
17 As a fair warning, I'm not a mathematician and I don't write proofs
18 often. If I get something wrong or there is a better way of doing
21 You can find most of the code from the book
[[https://github.com/hcs0/Hackers-Delight][here]].
24 ** Data Representation 25 We'll be sticking with a 32-bit word length as recommended in the
26 Preface. We will also represent registers as integers whenever
27 possible, instead of say a bit-vector. Without going into too much
28 detail, it's much more efficient to do bitwise ops on integers
29 instead of bit-vectors in Lisp.
31 We need a minimum of 16 general purpose registers, typically of word
32 length, with R0 reserved for a constant 0. To address 16 different
33 registers we actually only need 4 bits - 5-bits if we needed 32, 6 for
36 Floating-point support and special purpose registers are not required.
39 The Hacker's Delight RISC architecture is described in two tables,
40 denoted
=basic RISC= and
=full RISC= respectively.
42 Most instructions take two source registers
=RA= and
=RB= with a
43 destination register
=RT=. The actual general-purpose registers are
44 labelled
=R0= (containg the constant 0) through
=R15=.
46 A 3-Address machine is assumed and some instructions take 16-bit
47 signed or unsigned immediate values - denoted
=I= and
=Iu= 50 #+tblname: Basic Instruction Set (basic RISC)
51 | Opcode Mnemonic | Operands | Description | 52 |-----------------------------------------------------------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------| 53 | add,sub,mul,div,divu,rem,remu | RT,RA,RB | RT <- (op RA RB) | 54 | addi,muli | RT,RA,I | RT <- (op RA I), I is a 16-bit signed immediate-value | 55 | addis | RT,RA,I | RT <- (+ RA (\vert\vert I 0x0000)) | 56 | and,or,xor | RT,RA,RB | RT <- (op RA RB) | 57 | andi,ori,xori | RT,RA,Iu | As above, expect the last operand is a 16-bit unsigned immediate-value | 58 | beq,bne,blt,ble,bgt,bge | RT,target | Branch to target if (op RT) | 59 | bt,bf | RT,target | Branch true/false, same as bne/beq resp | 60 | cmpeq,cmpne,cmplt,cmple,cmpgt,cmpge,cmpltu,cmpleu,cmpgtu,cmpgeu | RT,RA,RB | RT <- (if (op RA RB) 1 0) | 61 | cmpieq,cmpine,cmpilt,cmpile,cmpigt,cmpige | RT,RA,I | Like cmpeq except second comparand is a 16-bit signed immediate-value | 62 | cmpiequ,cmpineu,cmpiltu,cmpileu,cmpigtu,cmpigeu | RT,RA,I | Like cmpltu except second comparand is a 16-bit unsigned immediate-value | 63 | ldbu,ldh,ldhu,ldw | RT,d(RA) | Load an unsigned-byte, signed-halfword, unsigned-halfword, or word into RT from (+ RA d) where d is a 16-bit signed immediate-value | 64 | mulhs,mulhu | RT,RA,RB | RT gets the high-order 32 bits of (* RA RB) | 65 | not | RT,RA | RT <- bitwise one's-complement of RA | 66 | shl,shr,shrs | RT,RA,RB | RT <- RA shifted left or right by rightmost six bits of RB; 0-fill except for shrs, which is sign-fill (shift amount treated modulo 64) | 67 | shli,shri,shrsi | RT,RA,Iu | RT <- RA shifted left or right by 5-bit immediate field | 68 | stb,sth,stw | RS,d(RA) | Store a byte,halfword,word from RS into memory at location (+ RA d) where d is a 16-bit signed immediate-value | 71 #+name: Addition Instructions (full RISC)
72 | Opcode Mnemonic | Operands | Description | 73 |-----------------------------------------------------------------+-----------+--------------------------------------------------------------------------------------------------------| 74 | abs,nabs | RT,RA | RT <- (op RA) | 75 | andc,eqv,nand,nor,orc | RT,RA,RB | RT <- (op RA RB) | 76 | extr | RT,RA,I,L | extract bits I through I+L-1 of RA and place them right-adjusted in RT, with 0-fill | 77 | extrs | RT,RA,I,L | Like extr, but sign-fill | 78 | ins | RT,RA,I,L | Insert bits 0 through L-1 of RA into bits I through I+L-1 of RT | 79 | nlz | RT,RA | RT gets count of leading 0's in RA (0 to 32) | 80 | pop | RT,RA | RT gets the number of 1-bits in RA (0 to 32) | 81 | ldb | RT,d(RA) | Load a signed byte into RT from memory at location (+ RA d) where d is a 16-bit signed immediate value | 82 | moveq,movne,movlt,movle,movgt,movge | RT,RA,RB | RT <- RA rotate-shifted left or right by the rightmost 5-bits of RB | 83 | shlr,shrr | RT,RA,RB | RT <- RA rotate-shifted left or right by the rightmost 5-bits of RB | 84 | shlri,shrri | RT,RA,Iu | RT <- RA rotate-shifted left or right by the 5-bit immediate field | 85 | trpeq,trpne,trplt,trple,trpgt,trpge,trpltu,trpleu,trpgtu,trpgeu | RA,RB | Trap (interrupt) if (op RA RB) | 86 | trpieq,trpine,trpilt,trpile,trpigt,trpige | RA,I | Trap if (op RA I) where I is a 16-bit signed immediate-value | 87 | trpiequ,trpineu,trpiltu,trpileu,trpigtu,trpigeu | RA,Iu | Trap if (op RA Iu) where Iu is a 16-bit unsigned immediate-value | 89 There is also some extensions, which are like macros that usually
90 expand to a single instruction.
92 #+name: Extended Mnemonics
93 | Extended Mnemonic | Expansion | Description | 94 |-------------------+------------------+---------------------------| 95 | b target | beq R0,target | Unconditional branch | 96 | li RT,I | (addi,addis,ori) | Load immediate | 97 | mov RT,RA | ori RT,RA,0 | Move register RA to RT | 98 | neg RT,RA | sub RT,R0,RA | Negate (two's-complement) | 99 | subi RT,RA,I | addi RT,RA,-I | Subtract immediate | 101 All of these instructions are available on x86,arm,riscv and the likes
102 so no real surprises. We will implement the basic set in Lisp, mapping
103 instructions directly to Lisp functions using macros.
106 We'll build this machine in Lisp and use plenty intrinsics from
107 SBCL. As a starting point I followed Paul Khuong's excellent blog
108 post:
[[https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-code-breadboard/][SBCL: The ultimate assembly code breadboard]].
110 Some things to keep in mind for our machine:
111 - every instruction requires at most two register reads and one
112 register write - good for compilers
113 - every instruction counts as a single cycle
114 - we pay no attention to instruction-level parallelism
118 :header-args: :session t :results none 122 (ql:quickload :prelude)
123 (in-package :std-user)
125 (:use :cl :std :std-user)
126 (:import-from :sb-assem :inst)
127 (:import-from :sb-vm :immediate-constant :registers :zero :ea))
129 ;; (in-package :sb-x86-64-asm)
131 (declaim (optimize (speed 3) (safety 1)))
132 (defconstant +word-size+ 32 "default word size and register length.")
136 #+begin_src lisp :package hakmem 137 (declaim (type (unsigned-byte #.+word-size+) +ro+))
138 (defconstant +r0+ 0 "constant value for register 0")
139 (defvar *stack* (make-array 8 :initial-contents (list sb-vm::r8-tn
147 (defvar *stack-pointer*)
149 (defvar *rax* sb-vm::rax-tn)
150 (defvar *rbx* sb-vm::rax-tn)
151 (defvar *rcx* sb-vm::rax-tn)
152 (defvar *rdx* sb-vm::rax-tn)
154 ;; (@ 0) returns the (current) register for TOS, (@ 1) returns
155 ;; the one just below, etc.
157 (aref *stack* (mod (+ i *stack-pointer*) (length *stack*))))
159 (defvar *code-base* sb-vm::rsi-tn)
160 (defvar *virtual-ip* sb-vm::rdi-tn)
161 (sb-x86-64-asm::get-gpr :qword 4)
162 ;; (sb-vm::immediate-constant-sc 10000)
163 ;; arena vector or list?
164 (defvar *instructions* (make-hash-table :test #'equal))
166 (defvar *primitive-code-offset* (* 64 67))
169 (alloc 0) ;; next free byte
170 (code (make-array *primitive-code-offset* :element-type 'octet)))
172 (defun emit-code (pages emitter)
173 ;; there must be as many code pages as there are stack slots
174 (assert (= (length *stack*) (length pages)))
175 ;; find the rightmost starting point, and align to 16 bytes
176 (let* ((alloc (logandc2 (+ 15 (reduce #'max pages :key #'code-page-alloc))
178 (bytes (loop for i below (length pages)
179 for page = (elt pages i)
180 collect (let ((segment (sb-assem:make-segment))
182 ;; assemble the variant for this value
183 ;; of *stack-pointer* in a fresh code
185 (sb-assem:assemble (segment)
186 ;; but first, insert padding
187 (sb-vm::emit-long-nop segment (- alloc (code-page-alloc page)))
189 ;; tidy up any backreference
190 (sb-assem:finalize-segment segment)
191 ;; then get the (position-independent) machine
192 ;; code as a vector of bytes
193 (sb-assem:segment-contents-as-vector segment)))))
194 ;; finally, copy each machine code sequence to the right code page
195 (map nil (lambda (page bytes)
196 (let ((alloc (code-page-alloc page)))
197 (replace (code-page-code page) bytes :start1 alloc)
198 (assert (<= (+ alloc (length bytes)) (length (code-page-code page))))
199 (setf (code-page-alloc page) (+ alloc (length bytes)))))
201 ;; and return the offset for that code sequence
204 (defun emit-all-code (&rest emitters)
205 (let ((pages (loop repeat (length *stack*)
206 for page = (make-code-page)
207 ;; prefill everything with one-byte NOPs
208 do (fill (code-page-code page) #x90)
210 (values (mapcar (lambda (emitter)
211 (emit-code pages emitter))
215 (defun next (&optional offset)
216 (setf offset (or offset 0)) ; accommodate primops that frob IP
217 (let ((rotation (mod *stack-pointer* (length *stack*))))
218 (inst movzx *rax* (make-ea :dword :base *virtual-ip*
220 (unless (= -4 offset)
221 (inst add *virtual-ip* (+ 4 offset)))
223 (inst add *rax* *code-base*)
224 (inst lea *rax* (make-ea :qword :base *code-base*
226 :disp (* rotation *primitive-code-offset*))))
230 (inst xchg (@ 0) (@ 1)) ; exchange top of stack and stack[1]
236 #+begin_src lisp :package hakmem 238 (defun %parse-reg3 (rt ra rb))
239 (defun %parse-reg2i (rt ra i))
240 (defun %parse-reg2ui (rt ra ui))
241 (defmacro def-inst (name args &body body)
242 ;; todo: compose a function based on regs+args+body
243 `(let ((sc *scratch*)
248 (declare (ignorable sc r0 ra rb rt))
249 (setf (gethash ',name *instructions*) (lambda ,args (progn ,@body)))))
251 (defmacro def-prim (name op)
252 `(def-inst ,name () (setf rt (,op ra rb))))
256 #+begin_src lisp :package hakmem 275 (def-prim and logand)
277 (def-prim xor logxor)
279 (defun get-inst (i) (gethash i *instructions*))
281 (defmacro %inst (i &body args)
282 `(funcall (get-inst ',i) ,@args))
284 (defun list-instructions (&optional (tbl *instructions*))
285 (hash-table-alist tbl))
288 #+name: instruction-list
289 #+begin_src lisp :results replace :exports both :package hakmem 293 #+RESULTS: instruction-list
295 ((XOR . #<FUNCTION (LAMBDA ()) {100E4ED81B}>)
296 (OR . #<FUNCTION (LAMBDA ()) {100E4ED84B}>)
297 (AND . #<FUNCTION (LAMBDA ()) {100E4ED87B}>)
298 (MULI . #<FUNCTION (LAMBDA (I)) {100E4ED8AB}>)
299 (ADDI . #<FUNCTION (LAMBDA (I)) {100E4ED8DB}>)
300 (CMPGE . #<FUNCTION (LAMBDA ()) {100E4ED90B}>)
301 (CMPGT . #<FUNCTION (LAMBDA ()) {100E4ED93B}>)
302 (CMPLE . #<FUNCTION (LAMBDA ()) {100E4ED96B}>)
303 (CMPLT . #<FUNCTION (LAMBDA ()) {100E4ED99B}>)
304 (CMPNE . #<FUNCTION (LAMBDA ()) {100E4ED9CB}>)
305 (CMPEQ . #<FUNCTION (LAMBDA ()) {100E4ED9FB}>)
306 (REM . #<FUNCTION (LAMBDA ()) {100E4EDA2B}>)
307 (DIV . #<FUNCTION (LAMBDA ()) {100E4EDA7B}>)
308 (MUL . #<FUNCTION (LAMBDA ()) {100E4EDABB}>)
309 (SUB . #<FUNCTION (LAMBDA ()) {100E4EDAEB}>)
310 (ADD . #<FUNCTION (LAMBDA ()) {100E4EDB1B}>))