changelog shortlog graph tags branches changeset files revisions annotate raw help

Mercurial > org > blog / draft/a-bit-of-risc.org

changeset 28: 6d54ccb29de4
parent: db5ece2206cf
author: Richard Westhaver <ellis@rwest.io>
date: Sun, 18 Aug 2024 22:16:12 -0400
permissions: -rw-r--r--
description: weekend warrior
1 #+title: A Bit of RISC
2 #+date: [2024-03-11 Mon]
3 #+setupfile: ../../clean.theme
4 #+property: header-args :eval no-export
5 I recently picked up [[https://dl.acm.org/doi/10.5555/2462741][Hacker's Delight]] and having a lot of fun with
6 it. It's a collection of bit-manipulation tricks collected by hackers
7 over many years. You can flip open pretty much anywhere in the book
8 and start learn something really cool.
9 
10 There's something about seeing bit strings and assembly code in a book
11 that really catches my attention, this one goes a bit further by even
12 describing a complete RISC (Reduced Instruction Set Computer) we can
13 implement to play around with the various tricks.
14 
15 As an exercise and for fun, I'd like to employ some Lisp-fu here and
16 implement a small VM specifically designed for mangling bits.
17 
18 As a fair warning, I'm not a mathematician and I don't write proofs
19 often. If I get something wrong or there is a better way of doing
20 things, let me know!
21 
22 You can find most of the code from the book [[https://github.com/hcs0/Hackers-Delight][here]].
23 
24 * Design
25 :PROPERTIES:
26 :ID: c030566f-1ede-4edc-b393-d2c6c1c2d19f
27 :END:
28 ** Data Representation
29 :PROPERTIES:
30 :ID: 6d93e084-1047-4a33-adb3-5c90af96ead0
31 :END:
32 We'll be sticking with a 32-bit word length as recommended in the
33 Preface. We will also represent registers as integers whenever
34 possible, instead of say a bit-vector. Without going into too much
35 detail, it's much more efficient to do bitwise ops on integers
36 instead of bit-vectors in Lisp.
37 
38 We need a minimum of 16 general purpose registers, typically of word
39 length, with R0 reserved for a constant 0. To address 16 different
40 registers we actually only need 4 bits - 5-bits if we needed 32, 6 for
41 64, etc.
42 
43 Floating-point support and special purpose registers are not required.
44 
45 ** Instructions
46 :PROPERTIES:
47 :ID: fb86a212-b49a-44ff-b30b-32c789ca55e2
48 :END:
49 The Hacker's Delight RISC architecture is described in two tables,
50 denoted =basic RISC= and =full RISC= respectively.
51 
52 Most instructions take two source registers =RA= and =RB= with a
53 destination register =RT=. The actual general-purpose registers are
54 labelled =R0= (containg the constant 0) through =R15=.
55 
56 A 3-Address machine is assumed and some instructions take 16-bit
57 signed or unsigned immediate values - denoted =I= and =Iu=
58 respectively.
59 
60 #+tblname: Basic Instruction Set (basic RISC)
61 | Opcode Mnemonic | Operands | Description |
62 |-----------------------------------------------------------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------|
63 | add,sub,mul,div,divu,rem,remu | RT,RA,RB | RT <- (op RA RB) |
64 | addi,muli | RT,RA,I | RT <- (op RA I), I is a 16-bit signed immediate-value |
65 | addis | RT,RA,I | RT <- (+ RA (\vert\vert I 0x0000)) |
66 | and,or,xor | RT,RA,RB | RT <- (op RA RB) |
67 | andi,ori,xori | RT,RA,Iu | As above, expect the last operand is a 16-bit unsigned immediate-value |
68 | beq,bne,blt,ble,bgt,bge | RT,target | Branch to target if (op RT) |
69 | bt,bf | RT,target | Branch true/false, same as bne/beq resp |
70 | cmpeq,cmpne,cmplt,cmple,cmpgt,cmpge,cmpltu,cmpleu,cmpgtu,cmpgeu | RT,RA,RB | RT <- (if (op RA RB) 1 0) |
71 | cmpieq,cmpine,cmpilt,cmpile,cmpigt,cmpige | RT,RA,I | Like cmpeq except second comparand is a 16-bit signed immediate-value |
72 | cmpiequ,cmpineu,cmpiltu,cmpileu,cmpigtu,cmpigeu | RT,RA,I | Like cmpltu except second comparand is a 16-bit unsigned immediate-value |
73 | ldbu,ldh,ldhu,ldw | RT,d(RA) | Load an unsigned-byte, signed-halfword, unsigned-halfword, or word into RT from (+ RA d) where d is a 16-bit signed immediate-value |
74 | mulhs,mulhu | RT,RA,RB | RT gets the high-order 32 bits of (* RA RB) |
75 | not | RT,RA | RT <- bitwise one's-complement of RA |
76 | shl,shr,shrs | RT,RA,RB | RT <- RA shifted left or right by rightmost six bits of RB; 0-fill except for shrs, which is sign-fill (shift amount treated modulo 64) |
77 | shli,shri,shrsi | RT,RA,Iu | RT <- RA shifted left or right by 5-bit immediate field |
78 | stb,sth,stw | RS,d(RA) | Store a byte,halfword,word from RS into memory at location (+ RA d) where d is a 16-bit signed immediate-value |
79 
80 
81 #+name: Addition Instructions (full RISC)
82 | Opcode Mnemonic | Operands | Description |
83 |-----------------------------------------------------------------+-----------+--------------------------------------------------------------------------------------------------------|
84 | abs,nabs | RT,RA | RT <- (op RA) |
85 | andc,eqv,nand,nor,orc | RT,RA,RB | RT <- (op RA RB) |
86 | extr | RT,RA,I,L | extract bits I through I+L-1 of RA and place them right-adjusted in RT, with 0-fill |
87 | extrs | RT,RA,I,L | Like extr, but sign-fill |
88 | ins | RT,RA,I,L | Insert bits 0 through L-1 of RA into bits I through I+L-1 of RT |
89 | nlz | RT,RA | RT gets count of leading 0's in RA (0 to 32) |
90 | pop | RT,RA | RT gets the number of 1-bits in RA (0 to 32) |
91 | ldb | RT,d(RA) | Load a signed byte into RT from memory at location (+ RA d) where d is a 16-bit signed immediate value |
92 | moveq,movne,movlt,movle,movgt,movge | RT,RA,RB | RT <- RA rotate-shifted left or right by the rightmost 5-bits of RB |
93 | shlr,shrr | RT,RA,RB | RT <- RA rotate-shifted left or right by the rightmost 5-bits of RB |
94 | shlri,shrri | RT,RA,Iu | RT <- RA rotate-shifted left or right by the 5-bit immediate field |
95 | trpeq,trpne,trplt,trple,trpgt,trpge,trpltu,trpleu,trpgtu,trpgeu | RA,RB | Trap (interrupt) if (op RA RB) |
96 | trpieq,trpine,trpilt,trpile,trpigt,trpige | RA,I | Trap if (op RA I) where I is a 16-bit signed immediate-value |
97 | trpiequ,trpineu,trpiltu,trpileu,trpigtu,trpigeu | RA,Iu | Trap if (op RA Iu) where Iu is a 16-bit unsigned immediate-value |
98 
99 There is also some extensions, which are like macros that usually
100 expand to a single instruction.
101 
102 #+name: Extended Mnemonics
103 | Extended Mnemonic | Expansion | Description |
104 |-------------------+------------------+---------------------------|
105 | b target | beq R0,target | Unconditional branch |
106 | li RT,I | (addi,addis,ori) | Load immediate |
107 | mov RT,RA | ori RT,RA,0 | Move register RA to RT |
108 | neg RT,RA | sub RT,R0,RA | Negate (two's-complement) |
109 | subi RT,RA,I | addi RT,RA,-I | Subtract immediate |
110 
111 All of these instructions are available on x86,arm,riscv and the likes
112 so no real surprises. We will implement the basic set in Lisp, mapping
113 instructions directly to Lisp functions using macros.
114 
115 ** Execution Model
116 :PROPERTIES:
117 :ID: f84fcd3d-86ad-4b9b-a330-4572d6120559
118 :END:
119 We'll build this machine in Lisp and use plenty intrinsics from
120 SBCL. As a starting point I followed Paul Khuong's excellent blog
121 post: [[https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-code-breadboard/][SBCL: The ultimate assembly code breadboard]].
122 
123 Some things to keep in mind for our machine:
124 - every instruction requires at most two register reads and one
125  register write - good for compilers
126 - every instruction counts as a single cycle
127 - we pay no attention to instruction-level parallelism
128 
129 * The HAKMEM VM
130 :properties:
131 :header-args: :session t :results none :noeval t
132 :ID: 43484c58-ab29-481c-b2b9-17ab4d91e22f
133 :end:
134 #+name: defpackage
135 #+begin_src lisp
136  (ql:quickload :prelude)
137  (in-package :std-user)
138  (defpackage :hakmem
139  (:use :cl :std :std-user)
140  (:import-from :sb-assem :inst)
141  (:import-from :sb-vm :immediate-constant :registers :zero :ea))
142  (in-package :hakmem)
143  ;; (in-package :sb-x86-64-asm)
144  ;; (in-readtable :std)
145  (declaim (optimize (speed 3) (safety 1)))
146 
147  (eval-always
148  (defconstant +word-size+ 32 "default word size and register length."))
149 #+end_src
150 
151 #+name: vars
152 #+begin_src lisp :package hakmem
153  (declaim (type (unsigned-byte #.+word-size+) +ro+))
154  (defconstant +r0+ 0 "constant value for register 0")
155  (defvar *stack* (make-array 8 :initial-contents (list sb-vm::r8-tn
156  sb-vm::r9-tn
157  sb-vm::r10-tn
158  sb-vm::r11-tn
159  sb-vm::r12-tn
160  sb-vm::r13-tn
161  sb-vm::r14-tn
162  sb-vm::r15-tn)))
163  (defvar *stack-pointer*)
164 
165  (defvar *rax* sb-vm::rax-tn)
166  (defvar *rbx* sb-vm::rax-tn)
167  (defvar *rcx* sb-vm::rax-tn)
168  (defvar *rdx* sb-vm::rax-tn)
169 
170  ;; (@ 0) returns the (current) register for TOS, (@ 1) returns
171  ;; the one just below, etc.
172  (defun @ (i)
173  (aref *stack* (mod (+ i *stack-pointer*) (length *stack*))))
174 
175  (defvar *code-base* sb-vm::rsi-tn)
176  (defvar *virtual-ip* sb-vm::rdi-tn)
177  (sb-x86-64-asm::get-gpr :qword 4)
178  ;; (sb-vm::immediate-constant-sc 10000)
179  ;; arena vector or list?
180  (defvar *instructions* (make-hash-table :test #'equal))
181 
182  (defvar *primitive-code-offset* (* 64 67))
183 
184  (defstruct code-page
185  (alloc 0) ;; next free byte
186  (code (make-array *primitive-code-offset* :element-type 'octet)))
187 
188  (defun emit-code (pages emitter)
189  ;; there must be as many code pages as there are stack slots
190  (assert (= (length *stack*) (length pages)))
191  ;; find the rightmost starting point, and align to 16 bytes
192  (let* ((alloc (logandc2 (+ 15 (reduce #'max pages :key #'code-page-alloc))
193  15))
194  (bytes (loop for i below (length pages)
195  for page = (elt pages i)
196  collect (let ((segment (sb-assem:make-segment))
197  (*stack-pointer* i))
198  ;; assemble the variant for this value
199  ;; of *stack-pointer* in a fresh code
200  ;; segment
201  (sb-assem:assemble (segment)
202  ;; but first, insert padding
203  (sb-vm::emit-long-nop segment (- alloc (code-page-alloc page)))
204  (funcall emitter))
205  ;; tidy up any backreference
206  (sb-assem:finalize-segment segment)
207  ;; then get the (position-independent) machine
208  ;; code as a vector of bytes
209  (sb-assem:segment-contents-as-vector segment)))))
210  ;; finally, copy each machine code sequence to the right code page
211  (map nil (lambda (page bytes)
212  (let ((alloc (code-page-alloc page)))
213  (replace (code-page-code page) bytes :start1 alloc)
214  (assert (<= (+ alloc (length bytes)) (length (code-page-code page))))
215  (setf (code-page-alloc page) (+ alloc (length bytes)))))
216  pages bytes)
217  ;; and return the offset for that code sequence
218  alloc))
219 
220  (defun emit-all-code (&rest emitters)
221  (let ((pages (loop repeat (length *stack*)
222  for page = (make-code-page)
223  ;; prefill everything with one-byte NOPs
224  do (fill (code-page-code page) #x90)
225  collect page)))
226  (values (mapcar (lambda (emitter)
227  (emit-code pages emitter))
228  emitters)
229  pages)))
230 
231  (defun next (&optional offset)
232  (setf offset (or offset 0)) ; accommodate primops that frob IP
233  (let ((rotation (mod *stack-pointer* (length *stack*))))
234  (inst movzx *rax* (make-ea :dword :base *virtual-ip*
235  :disp offset))
236  (unless (= -4 offset)
237  (inst add *virtual-ip* (+ 4 offset)))
238  (if (zerop rotation)
239  (inst add *rax* *code-base*)
240  (inst lea *rax* (make-ea :qword :base *code-base*
241  :index *rax*
242  :disp (* rotation *primitive-code-offset*))))
243  (inst jmp *rax*)))
244 
245  (defun swap ()
246  (inst xchg (@ 0) (@ 1)) ; exchange top of stack and stack[1]
247  (next))
248 
249 #+end_src
250 
251 #+name: instructions
252 #+begin_src lisp :package hakmem
253  ;; todo
254  (defun %parse-reg3 (rt ra rb))
255  (defun %parse-reg2i (rt ra i))
256  (defun %parse-reg2ui (rt ra ui))
257  (defmacro def-inst (name args &body body)
258  ;; todo: compose a function based on regs+args+body
259  `(let ((sc *scratch*)
260  (r0 +r0+)
261  (ra 0)
262  (rb 0)
263  (rt 0))
264  (declare (ignorable sc r0 ra rb rt))
265  (setf (gethash ',name *instructions*) (lambda ,args (progn ,@body)))))
266 
267  (defmacro def-prim (name op)
268  `(def-inst ,name () (setf rt (,op ra rb))))
269 #+end_src
270 
271 #+name: prims
272 #+begin_src lisp :package hakmem
273  (def-prim add +)
274  (def-prim sub -)
275  (def-prim mul *)
276  (def-prim div /)
277  ;; divu
278  (def-prim rem mod)
279  ;; remu
280  (def-prim cmpeq =)
281  (def-prim cmpne /=)
282  (def-prim cmplt <)
283  (def-prim cmple <=)
284  (def-prim cmpgt >)
285  (def-prim cmpge >=)
286  ;; ltu leu gtu geu
287  (def-inst addi (i)
288  (setf rt (+ ra i)))
289  (def-inst muli (i)
290  (setf rt (* ra i)))
291  (def-prim and logand)
292  (def-prim or logior)
293  (def-prim xor logxor)
294 
295  (defun get-inst (i) (gethash i *instructions*))
296 
297  (defmacro %inst (i &body args)
298  `(funcall (get-inst ',i) ,@args))
299 
300  (defun list-instructions (&optional (tbl *instructions*))
301  (hash-table-alist tbl))
302 
303 #+end_src
304 #+name: instruction-list
305 #+begin_src lisp :results replace :exports both :package hakmem
306  (list-instructions)
307 #+end_src
308 
309 #+RESULTS: instruction-list
310 #+begin_example
311 ((XOR . #<FUNCTION (LAMBDA ()) {101CF0367B}>)
312  (OR . #<FUNCTION (LAMBDA ()) {101CF036AB}>)
313  (AND . #<FUNCTION (LAMBDA ()) {101CF036DB}>)
314  (MULI . #<FUNCTION (LAMBDA (I)) {101CF0370B}>)
315  (ADDI . #<FUNCTION (LAMBDA (I)) {101CF0373B}>)
316  (CMPGE . #<FUNCTION (LAMBDA ()) {101CF0376B}>)
317  (CMPGT . #<FUNCTION (LAMBDA ()) {101CF0379B}>)
318  (CMPLE . #<FUNCTION (LAMBDA ()) {101CF037CB}>)
319  (CMPLT . #<FUNCTION (LAMBDA ()) {101CF037FB}>)
320  (CMPNE . #<FUNCTION (LAMBDA ()) {101CF0382B}>)
321  (CMPEQ . #<FUNCTION (LAMBDA ()) {101CF0385B}>)
322  (REM . #<FUNCTION (LAMBDA ()) {101CF0388B}>)
323  (DIV . #<FUNCTION (LAMBDA ()) {101CF038DB}>)
324  (MUL . #<FUNCTION (LAMBDA ()) {101CF0391B}>)
325  (SUB . #<FUNCTION (LAMBDA ()) {101CF0394B}>)
326  (ADD . #<FUNCTION (LAMBDA ()) {101CF0397B}>))
327 #+end_example