changelog shortlog graph tags branches changeset files revisions annotate raw help

Mercurial > org > blog / draft/a-bit-of-risc.org

changeset 18: d77884ec2b44
child: 889759cafcc2
author: Richard Westhaver <ellis@rwest.io>
date: Sun, 28 Apr 2024 19:49:20 -0400
permissions: -rw-r--r--
description: drafts
1 #+title: A Bit of RISC
2 #+date: [2024-03-11 Mon]
3 I recently picked up [[https://dl.acm.org/doi/10.5555/2462741][Hacker's Delight]] and having a lot of fun with
4 it. It's a collection of bit-manipulation tricks collected by hackers
5 over many years. You can flip open pretty much anywhere in the book
6 and start learn something really cool.
7 
8 There's something about seeing bit strings and assembly code in a book
9 that really catches my attention, this one goes a bit further by even
10 describing a complete RISC (Reduced Instruction Set Computer) we can
11 implement to play around with the various tricks.
12 
13 As an exercise and for fun, I'd like to employ some Lisp-fu here and
14 implement a small VM specifically designed for mangling bits.
15 
16 As a fair warning, I'm not a mathematician and I don't write proofs
17 often. If I get something wrong or there is a better way of doing
18 things, let me know!
19 
20 You can find most of the code from the book [[https://github.com/hcs0/Hackers-Delight][here]].
21 
22 * Design
23 ** Data Representation
24 We'll be sticking with a 32-bit word length as recommended in the
25 Preface. We will also represent registers as integers whenever
26 possible, instead of say a bit-vector. Without going into too much
27 detail, it's much more efficient to do bitwise ops on integers
28 instead of bit-vectors in Lisp.
29 
30 We need a minimum of 16 general purpose registers, typically of word
31 length, with R0 reserved for a constant 0. To address 16 different
32 registers we actually only need 4 bits - 5-bits if we needed 32, 6 for
33 64, etc.
34 
35 Floating-point support and special purpose registers are not required.
36 
37 ** Instructions
38 The Hacker's Delight RISC architecture is described in two tables,
39 denoted =basic RISC= and =full RISC= respectively.
40 
41 Most instructions take two source registers =RA= and =RB= with a
42 destination register =RT=. The actual general-purpose registers are
43 labelled =R0= (containg the constant 0) through =R15=.
44 
45 A 3-Address machine is assumed and some instructions take 16-bit
46 signed or unsigned immediate values - denoted =I= and =Iu=
47 respectively.
48 
49 #+tblname: Basic Instruction Set (basic RISC)
50 | Opcode Mnemonic | Operands | Description |
51 |-----------------------------------------------------------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------|
52 | add,sub,mul,div,divu,rem,remu | RT,RA,RB | RT <- (op RA RB) |
53 | addi,muli | RT,RA,I | RT <- (op RA I), I is a 16-bit signed immediate-value |
54 | addis | RT,RA,I | RT <- (+ RA (\vert\vert I 0x0000)) |
55 | and,or,xor | RT,RA,RB | RT <- (op RA RB) |
56 | andi,ori,xori | RT,RA,Iu | As above, expect the last operand is a 16-bit unsigned immediate-value |
57 | beq,bne,blt,ble,bgt,bge | RT,target | Branch to target if (op RT) |
58 | bt,bf | RT,target | Branch true/false, same as bne/beq resp |
59 | cmpeq,cmpne,cmplt,cmple,cmpgt,cmpge,cmpltu,cmpleu,cmpgtu,cmpgeu | RT,RA,RB | RT <- (if (op RA RB) 1 0) |
60 | cmpieq,cmpine,cmpilt,cmpile,cmpigt,cmpige | RT,RA,I | Like cmpeq except second comparand is a 16-bit signed immediate-value |
61 | cmpiequ,cmpineu,cmpiltu,cmpileu,cmpigtu,cmpigeu | RT,RA,I | Like cmpltu except second comparand is a 16-bit unsigned immediate-value |
62 | ldbu,ldh,ldhu,ldw | RT,d(RA) | Load an unsigned-byte, signed-halfword, unsigned-halfword, or word into RT from (+ RA d) where d is a 16-bit signed immediate-value |
63 | mulhs,mulhu | RT,RA,RB | RT gets the high-order 32 bits of (* RA RB) |
64 | not | RT,RA | RT <- bitwise one's-complement of RA |
65 | shl,shr,shrs | RT,RA,RB | RT <- RA shifted left or right by rightmost six bits of RB; 0-fill except for shrs, which is sign-fill (shift amount treated modulo 64) |
66 | shli,shri,shrsi | RT,RA,Iu | RT <- RA shifted left or right by 5-bit immediate field |
67 | stb,sth,stw | RS,d(RA) | Store a byte,halfword,word from RS into memory at location (+ RA d) where d is a 16-bit signed immediate-value |
68 
69 
70 #+name: Addition Instructions (full RISC)
71 | Opcode Mnemonic | Operands | Description |
72 |-----------------------------------------------------------------+-----------+--------------------------------------------------------------------------------------------------------|
73 | abs,nabs | RT,RA | RT <- (op RA) |
74 | andc,eqv,nand,nor,orc | RT,RA,RB | RT <- (op RA RB) |
75 | extr | RT,RA,I,L | extract bits I through I+L-1 of RA and place them right-adjusted in RT, with 0-fill |
76 | extrs | RT,RA,I,L | Like extr, but sign-fill |
77 | ins | RT,RA,I,L | Insert bits 0 through L-1 of RA into bits I through I+L-1 of RT |
78 | nlz | RT,RA | RT gets count of leading 0's in RA (0 to 32) |
79 | pop | RT,RA | RT gets the number of 1-bits in RA (0 to 32) |
80 | ldb | RT,d(RA) | Load a signed byte into RT from memory at location (+ RA d) where d is a 16-bit signed immediate value |
81 | moveq,movne,movlt,movle,movgt,movge | RT,RA,RB | RT <- RA rotate-shifted left or right by the rightmost 5-bits of RB |
82 | shlr,shrr | RT,RA,RB | RT <- RA rotate-shifted left or right by the rightmost 5-bits of RB |
83 | shlri,shrri | RT,RA,Iu | RT <- RA rotate-shifted left or right by the 5-bit immediate field |
84 | trpeq,trpne,trplt,trple,trpgt,trpge,trpltu,trpleu,trpgtu,trpgeu | RA,RB | Trap (interrupt) if (op RA RB) |
85 | trpieq,trpine,trpilt,trpile,trpigt,trpige | RA,I | Trap if (op RA I) where I is a 16-bit signed immediate-value |
86 | trpiequ,trpineu,trpiltu,trpileu,trpigtu,trpigeu | RA,Iu | Trap if (op RA Iu) where Iu is a 16-bit unsigned immediate-value |
87 
88 There is also some extensions, which are like macros that usually
89 expand to a single instruction.
90 
91 #+name: Extended Mnemonics
92 | Extended Mnemonic | Expansion | Description |
93 |-------------------+------------------+---------------------------|
94 | b target | beq R0,target | Unconditional branch |
95 | li RT,I | (addi,addis,ori) | Load immediate |
96 | mov RT,RA | ori RT,RA,0 | Move register RA to RT |
97 | neg RT,RA | sub RT,R0,RA | Negate (two's-complement) |
98 | subi RT,RA,I | addi RT,RA,-I | Subtract immediate |
99 
100 All of these instructions are available on x86,arm,riscv and the likes
101 so no real surprises. We will implement the basic set in Lisp, mapping
102 instructions directly to Lisp functions using macros.
103 
104 ** Execution Model
105 We'll build this machine in Lisp and use plenty intrinsics from
106 SBCL. As a starting point I followed Paul Khuong's excellent blog
107 post: [[https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-code-breadboard/][SBCL: The ultimate assembly code breadboard]].
108 
109 Some things to keep in mind for our machine:
110 - every instruction requires at most two register reads and one
111  register write - good for compilers
112 - every instruction counts as a single cycle
113 - we pay no attention to instruction-level parallelism
114 
115 * The HAKMEM VM
116 :properties:
117 :header-args: :session t :results none
118 :end:
119 #+name: defpackage
120 #+begin_src lisp
121  (ql:quickload :prelude)
122  (in-package :std-user)
123  (defpackage :hakmem
124  (:use :cl :std)
125  (:import-from :sb-assem :inst)
126  (:import-from :sb-vm :immediate-constant :registers :zero :ea))
127  (in-package :hakmem)
128  ;; (in-package :sb-x86-64-asm)
129  (in-readtable :std)
130  (declaim (optimize (speed 3) (safety 1)))
131  (defconstant +word-size+ 32 "default word size and register length.")
132 #+end_src
133 
134 #+name: vars
135 #+begin_src lisp :package hakmem
136  (declaim (type (unsigned-byte #.+word-size+) +ro+))
137  (defconstant +r0+ 0 "constant value for register 0")
138  (defvar *stack* (make-array 8 :initial-contents (list sb-vm::r8-tn
139  sb-vm::r9-tn
140  sb-vm::r10-tn
141  sb-vm::r11-tn
142  sb-vm::r12-tn
143  sb-vm::r13-tn
144  sb-vm::r14-tn
145  sb-vm::r15-tn)))
146  (defvar *stack-pointer*)
147 
148  (defvar *rax* sb-vm::rax-tn)
149  (defvar *rbx* sb-vm::rax-tn)
150  (defvar *rcx* sb-vm::rax-tn)
151  (defvar *rdx* sb-vm::rax-tn)
152 
153  ;; (@ 0) returns the (current) register for TOS, (@ 1) returns
154  ;; the one just below, etc.
155  (defun @ (i)
156  (aref *stack* (mod (+ i *stack-pointer*) (length *stack*))))
157 
158  (defvar *code-base* sb-vm::rsi-tn)
159  (defvar *virtual-ip* sb-vm::rdi-tn)
160  (sb-x86-64-asm::get-gpr :qword 4)
161  ;; (sb-vm::immediate-constant-sc 10000)
162  ;; arena vector or list?
163  (defvar *instructions* (make-hash-table :test #'equal))
164 
165  (defvar *primitive-code-offset* (* 64 67))
166 
167  (defstruct code-page
168  (alloc 0) ;; next free byte
169  (code (make-array *primitive-code-offset* :element-type 'octet)))
170 
171  (defun emit-code (pages emitter)
172  ;; there must be as many code pages as there are stack slots
173  (assert (= (length *stack*) (length pages)))
174  ;; find the rightmost starting point, and align to 16 bytes
175  (let* ((alloc (logandc2 (+ 15 (reduce #'max pages :key #'code-page-alloc))
176  15))
177  (bytes (loop for i below (length pages)
178  for page = (elt pages i)
179  collect (let ((segment (sb-assem:make-segment))
180  (*stack-pointer* i))
181  ;; assemble the variant for this value
182  ;; of *stack-pointer* in a fresh code
183  ;; segment
184  (sb-assem:assemble (segment)
185  ;; but first, insert padding
186  (sb-vm::emit-long-nop segment (- alloc (code-page-alloc page)))
187  (funcall emitter))
188  ;; tidy up any backreference
189  (sb-assem:finalize-segment segment)
190  ;; then get the (position-independent) machine
191  ;; code as a vector of bytes
192  (sb-assem:segment-contents-as-vector segment)))))
193  ;; finally, copy each machine code sequence to the right code page
194  (map nil (lambda (page bytes)
195  (let ((alloc (code-page-alloc page)))
196  (replace (code-page-code page) bytes :start1 alloc)
197  (assert (<= (+ alloc (length bytes)) (length (code-page-code page))))
198  (setf (code-page-alloc page) (+ alloc (length bytes)))))
199  pages bytes)
200  ;; and return the offset for that code sequence
201  alloc))
202 
203  (defun emit-all-code (&rest emitters)
204  (let ((pages (loop repeat (length *stack*)
205  for page = (make-code-page)
206  ;; prefill everything with one-byte NOPs
207  do (fill (code-page-code page) #x90)
208  collect page)))
209  (values (mapcar (lambda (emitter)
210  (emit-code pages emitter))
211  emitters)
212  pages)))
213 
214  (defun next (&optional offset)
215  (setf offset (or offset 0)) ; accommodate primops that frob IP
216  (let ((rotation (mod *stack-pointer* (length *stack*))))
217  (inst movzx *rax* (make-ea :dword :base *virtual-ip*
218  :disp offset))
219  (unless (= -4 offset)
220  (inst add *virtual-ip* (+ 4 offset)))
221  (if (zerop rotation)
222  (inst add *rax* *code-base*)
223  (inst lea *rax* (make-ea :qword :base *code-base*
224  :index *rax*
225  :disp (* rotation *primitive-code-offset*))))
226  (inst jmp *rax*)))
227 
228  (defun swap ()
229  (inst xchg (@ 0) (@ 1)) ; exchange top of stack and stack[1]
230  (next))
231 
232 #+end_src
233 
234 #+name: instructions
235 #+begin_src lisp :package hakmem
236  ;; todo
237  (defun %parse-reg3 (rt ra rb))
238  (defun %parse-reg2i (rt ra i))
239  (defun %parse-reg2ui (rt ra ui))
240  (defmacro def-inst (name args &body body)
241  ;; todo: compose a function based on regs+args+body
242  `(let ((sc *scratch*)
243  (r0 +r0+)
244  (ra 0)
245  (rb 0)
246  (rt 0))
247  (declare (ignorable sc r0 ra rb rt))
248  (setf (gethash ',name *instructions*) (lambda ,args (progn ,@body)))))
249 
250  (defmacro def-prim (name op)
251  `(def-inst ,name () (setf rt (,op ra rb))))
252 #+end_src
253 
254 #+name: prims
255 #+begin_src lisp :package hakmem
256  (def-prim add +)
257  (def-prim sub -)
258  (def-prim mul *)
259  (def-prim div /)
260  ;; divu
261  (def-prim rem mod)
262  ;; remu
263  (def-prim cmpeq =)
264  (def-prim cmpne /=)
265  (def-prim cmplt <)
266  (def-prim cmple <=)
267  (def-prim cmpgt >)
268  (def-prim cmpge >=)
269  ;; ltu leu gtu geu
270  (def-inst addi (i)
271  (setf rt (+ ra i)))
272  (def-inst muli (i)
273  (setf rt (* ra i)))
274  (def-prim and logand)
275  (def-prim or logior)
276  (def-prim xor logxor)
277 
278  (defun get-inst (i) (gethash i *instructions*))
279 
280  (defmacro %inst (i &body args)
281  `(funcall (get-inst ',i) ,@args))
282 
283  (defun list-instructions (&optional (tbl *instructions*))
284  (hash-table-alist tbl))
285 
286 #+end_src
287 #+name: instruction-list
288 #+begin_src lisp :results replace :exports both :package hakmem
289  (list-instructions)
290 #+end_src
291 
292 #+RESULTS: instruction-list
293 #+begin_example
294 ((XOR . #<FUNCTION (LAMBDA ()) {10077C774B}>)
295  (OR . #<FUNCTION (LAMBDA ()) {10077C777B}>)
296  (AND . #<FUNCTION (LAMBDA ()) {10077C77AB}>)
297  (MULI . #<FUNCTION (LAMBDA (I)) {10077C77DB}>)
298  (ADDI . #<FUNCTION (LAMBDA (I)) {10077C780B}>)
299  (CMPGE . #<FUNCTION (LAMBDA ()) {10077C783B}>)
300  (CMPGT . #<FUNCTION (LAMBDA ()) {10077C786B}>)
301  (CMPLE . #<FUNCTION (LAMBDA ()) {10077C789B}>)
302  (CMPLT . #<FUNCTION (LAMBDA ()) {10077C78CB}>)
303  (CMPNE . #<FUNCTION (LAMBDA ()) {10077C78FB}>)
304  (CMPEQ . #<FUNCTION (LAMBDA ()) {10077C792B}>)
305  (REM . #<FUNCTION (LAMBDA ()) {10077C795B}>)
306  (DIV . #<FUNCTION (LAMBDA ()) {10077C79AB}>)
307  (MUL . #<FUNCTION (LAMBDA ()) {10077C79EB}>)
308  (SUB . #<FUNCTION (LAMBDA ()) {10077C7A1B}>)
309  (ADD . #<FUNCTION (LAMBDA ()) {10077C7A4B}>))
310 #+end_example