18
|
1
|
#+title: A Bit of RISC |
|
2
|
#+date: [2024-03-11 Mon] |
20
|
3
|
#+setupfile: ../../../clean.theme |
18
|
4
|
I recently picked up [[https://dl.acm.org/doi/10.5555/2462741][Hacker's Delight]] and having a lot of fun with |
|
5
|
it. It's a collection of bit-manipulation tricks collected by hackers |
|
6
|
over many years. You can flip open pretty much anywhere in the book |
|
7
|
and start learn something really cool. |
|
8
|
|
|
9
|
There's something about seeing bit strings and assembly code in a book |
|
10
|
that really catches my attention, this one goes a bit further by even |
|
11
|
describing a complete RISC (Reduced Instruction Set Computer) we can |
|
12
|
implement to play around with the various tricks. |
|
13
|
|
|
14
|
As an exercise and for fun, I'd like to employ some Lisp-fu here and |
|
15
|
implement a small VM specifically designed for mangling bits. |
|
16
|
|
|
17
|
As a fair warning, I'm not a mathematician and I don't write proofs |
|
18
|
often. If I get something wrong or there is a better way of doing |
|
19
|
things, let me know! |
|
20
|
|
|
21
|
You can find most of the code from the book [[https://github.com/hcs0/Hackers-Delight][here]]. |
|
22
|
|
|
23
|
* Design |
|
24
|
** Data Representation |
|
25
|
We'll be sticking with a 32-bit word length as recommended in the |
|
26
|
Preface. We will also represent registers as integers whenever |
|
27
|
possible, instead of say a bit-vector. Without going into too much |
|
28
|
detail, it's much more efficient to do bitwise ops on integers |
|
29
|
instead of bit-vectors in Lisp. |
|
30
|
|
|
31
|
We need a minimum of 16 general purpose registers, typically of word |
|
32
|
length, with R0 reserved for a constant 0. To address 16 different |
|
33
|
registers we actually only need 4 bits - 5-bits if we needed 32, 6 for |
|
34
|
64, etc. |
|
35
|
|
|
36
|
Floating-point support and special purpose registers are not required. |
|
37
|
|
|
38
|
** Instructions |
|
39
|
The Hacker's Delight RISC architecture is described in two tables, |
|
40
|
denoted =basic RISC= and =full RISC= respectively. |
|
41
|
|
|
42
|
Most instructions take two source registers =RA= and =RB= with a |
|
43
|
destination register =RT=. The actual general-purpose registers are |
|
44
|
labelled =R0= (containg the constant 0) through =R15=. |
|
45
|
|
|
46
|
A 3-Address machine is assumed and some instructions take 16-bit |
|
47
|
signed or unsigned immediate values - denoted =I= and =Iu= |
|
48
|
respectively. |
|
49
|
|
|
50
|
#+tblname: Basic Instruction Set (basic RISC) |
|
51
|
| Opcode Mnemonic | Operands | Description | |
|
52
|
|-----------------------------------------------------------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------| |
|
53
|
| add,sub,mul,div,divu,rem,remu | RT,RA,RB | RT <- (op RA RB) | |
|
54
|
| addi,muli | RT,RA,I | RT <- (op RA I), I is a 16-bit signed immediate-value | |
|
55
|
| addis | RT,RA,I | RT <- (+ RA (\vert\vert I 0x0000)) | |
|
56
|
| and,or,xor | RT,RA,RB | RT <- (op RA RB) | |
|
57
|
| andi,ori,xori | RT,RA,Iu | As above, expect the last operand is a 16-bit unsigned immediate-value | |
|
58
|
| beq,bne,blt,ble,bgt,bge | RT,target | Branch to target if (op RT) | |
|
59
|
| bt,bf | RT,target | Branch true/false, same as bne/beq resp | |
|
60
|
| cmpeq,cmpne,cmplt,cmple,cmpgt,cmpge,cmpltu,cmpleu,cmpgtu,cmpgeu | RT,RA,RB | RT <- (if (op RA RB) 1 0) | |
|
61
|
| cmpieq,cmpine,cmpilt,cmpile,cmpigt,cmpige | RT,RA,I | Like cmpeq except second comparand is a 16-bit signed immediate-value | |
|
62
|
| cmpiequ,cmpineu,cmpiltu,cmpileu,cmpigtu,cmpigeu | RT,RA,I | Like cmpltu except second comparand is a 16-bit unsigned immediate-value | |
|
63
|
| ldbu,ldh,ldhu,ldw | RT,d(RA) | Load an unsigned-byte, signed-halfword, unsigned-halfword, or word into RT from (+ RA d) where d is a 16-bit signed immediate-value | |
|
64
|
| mulhs,mulhu | RT,RA,RB | RT gets the high-order 32 bits of (* RA RB) | |
|
65
|
| not | RT,RA | RT <- bitwise one's-complement of RA | |
|
66
|
| shl,shr,shrs | RT,RA,RB | RT <- RA shifted left or right by rightmost six bits of RB; 0-fill except for shrs, which is sign-fill (shift amount treated modulo 64) | |
|
67
|
| shli,shri,shrsi | RT,RA,Iu | RT <- RA shifted left or right by 5-bit immediate field | |
|
68
|
| stb,sth,stw | RS,d(RA) | Store a byte,halfword,word from RS into memory at location (+ RA d) where d is a 16-bit signed immediate-value | |
|
69
|
|
|
70
|
|
|
71
|
#+name: Addition Instructions (full RISC) |
|
72
|
| Opcode Mnemonic | Operands | Description | |
|
73
|
|-----------------------------------------------------------------+-----------+--------------------------------------------------------------------------------------------------------| |
|
74
|
| abs,nabs | RT,RA | RT <- (op RA) | |
|
75
|
| andc,eqv,nand,nor,orc | RT,RA,RB | RT <- (op RA RB) | |
|
76
|
| extr | RT,RA,I,L | extract bits I through I+L-1 of RA and place them right-adjusted in RT, with 0-fill | |
|
77
|
| extrs | RT,RA,I,L | Like extr, but sign-fill | |
|
78
|
| ins | RT,RA,I,L | Insert bits 0 through L-1 of RA into bits I through I+L-1 of RT | |
|
79
|
| nlz | RT,RA | RT gets count of leading 0's in RA (0 to 32) | |
|
80
|
| pop | RT,RA | RT gets the number of 1-bits in RA (0 to 32) | |
|
81
|
| ldb | RT,d(RA) | Load a signed byte into RT from memory at location (+ RA d) where d is a 16-bit signed immediate value | |
|
82
|
| moveq,movne,movlt,movle,movgt,movge | RT,RA,RB | RT <- RA rotate-shifted left or right by the rightmost 5-bits of RB | |
|
83
|
| shlr,shrr | RT,RA,RB | RT <- RA rotate-shifted left or right by the rightmost 5-bits of RB | |
|
84
|
| shlri,shrri | RT,RA,Iu | RT <- RA rotate-shifted left or right by the 5-bit immediate field | |
|
85
|
| trpeq,trpne,trplt,trple,trpgt,trpge,trpltu,trpleu,trpgtu,trpgeu | RA,RB | Trap (interrupt) if (op RA RB) | |
|
86
|
| trpieq,trpine,trpilt,trpile,trpigt,trpige | RA,I | Trap if (op RA I) where I is a 16-bit signed immediate-value | |
|
87
|
| trpiequ,trpineu,trpiltu,trpileu,trpigtu,trpigeu | RA,Iu | Trap if (op RA Iu) where Iu is a 16-bit unsigned immediate-value | |
|
88
|
|
|
89
|
There is also some extensions, which are like macros that usually |
|
90
|
expand to a single instruction. |
|
91
|
|
|
92
|
#+name: Extended Mnemonics |
|
93
|
| Extended Mnemonic | Expansion | Description | |
|
94
|
|-------------------+------------------+---------------------------| |
|
95
|
| b target | beq R0,target | Unconditional branch | |
|
96
|
| li RT,I | (addi,addis,ori) | Load immediate | |
|
97
|
| mov RT,RA | ori RT,RA,0 | Move register RA to RT | |
|
98
|
| neg RT,RA | sub RT,R0,RA | Negate (two's-complement) | |
|
99
|
| subi RT,RA,I | addi RT,RA,-I | Subtract immediate | |
|
100
|
|
|
101
|
All of these instructions are available on x86,arm,riscv and the likes |
|
102
|
so no real surprises. We will implement the basic set in Lisp, mapping |
|
103
|
instructions directly to Lisp functions using macros. |
|
104
|
|
|
105
|
** Execution Model |
|
106
|
We'll build this machine in Lisp and use plenty intrinsics from |
|
107
|
SBCL. As a starting point I followed Paul Khuong's excellent blog |
|
108
|
post: [[https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-code-breadboard/][SBCL: The ultimate assembly code breadboard]]. |
|
109
|
|
|
110
|
Some things to keep in mind for our machine: |
|
111
|
- every instruction requires at most two register reads and one |
|
112
|
register write - good for compilers |
|
113
|
- every instruction counts as a single cycle |
|
114
|
- we pay no attention to instruction-level parallelism |
|
115
|
|
|
116
|
* The HAKMEM VM |
|
117
|
:properties: |
|
118
|
:header-args: :session t :results none |
|
119
|
:end: |
|
120
|
#+name: defpackage |
|
121
|
#+begin_src lisp |
|
122
|
(ql:quickload :prelude) |
|
123
|
(in-package :std-user) |
|
124
|
(defpackage :hakmem |
|
125
|
(:use :cl :std) |
|
126
|
(:import-from :sb-assem :inst) |
|
127
|
(:import-from :sb-vm :immediate-constant :registers :zero :ea)) |
|
128
|
(in-package :hakmem) |
|
129
|
;; (in-package :sb-x86-64-asm) |
|
130
|
(in-readtable :std) |
|
131
|
(declaim (optimize (speed 3) (safety 1))) |
|
132
|
(defconstant +word-size+ 32 "default word size and register length.") |
|
133
|
#+end_src |
|
134
|
|
|
135
|
#+name: vars |
|
136
|
#+begin_src lisp :package hakmem |
|
137
|
(declaim (type (unsigned-byte #.+word-size+) +ro+)) |
|
138
|
(defconstant +r0+ 0 "constant value for register 0") |
|
139
|
(defvar *stack* (make-array 8 :initial-contents (list sb-vm::r8-tn |
|
140
|
sb-vm::r9-tn |
|
141
|
sb-vm::r10-tn |
|
142
|
sb-vm::r11-tn |
|
143
|
sb-vm::r12-tn |
|
144
|
sb-vm::r13-tn |
|
145
|
sb-vm::r14-tn |
|
146
|
sb-vm::r15-tn))) |
|
147
|
(defvar *stack-pointer*) |
|
148
|
|
|
149
|
(defvar *rax* sb-vm::rax-tn) |
|
150
|
(defvar *rbx* sb-vm::rax-tn) |
|
151
|
(defvar *rcx* sb-vm::rax-tn) |
|
152
|
(defvar *rdx* sb-vm::rax-tn) |
|
153
|
|
|
154
|
;; (@ 0) returns the (current) register for TOS, (@ 1) returns |
|
155
|
;; the one just below, etc. |
|
156
|
(defun @ (i) |
|
157
|
(aref *stack* (mod (+ i *stack-pointer*) (length *stack*)))) |
|
158
|
|
|
159
|
(defvar *code-base* sb-vm::rsi-tn) |
|
160
|
(defvar *virtual-ip* sb-vm::rdi-tn) |
|
161
|
(sb-x86-64-asm::get-gpr :qword 4) |
|
162
|
;; (sb-vm::immediate-constant-sc 10000) |
|
163
|
;; arena vector or list? |
|
164
|
(defvar *instructions* (make-hash-table :test #'equal)) |
|
165
|
|
|
166
|
(defvar *primitive-code-offset* (* 64 67)) |
|
167
|
|
|
168
|
(defstruct code-page |
|
169
|
(alloc 0) ;; next free byte |
|
170
|
(code (make-array *primitive-code-offset* :element-type 'octet))) |
|
171
|
|
|
172
|
(defun emit-code (pages emitter) |
|
173
|
;; there must be as many code pages as there are stack slots |
|
174
|
(assert (= (length *stack*) (length pages))) |
|
175
|
;; find the rightmost starting point, and align to 16 bytes |
|
176
|
(let* ((alloc (logandc2 (+ 15 (reduce #'max pages :key #'code-page-alloc)) |
|
177
|
15)) |
|
178
|
(bytes (loop for i below (length pages) |
|
179
|
for page = (elt pages i) |
|
180
|
collect (let ((segment (sb-assem:make-segment)) |
|
181
|
(*stack-pointer* i)) |
|
182
|
;; assemble the variant for this value |
|
183
|
;; of *stack-pointer* in a fresh code |
|
184
|
;; segment |
|
185
|
(sb-assem:assemble (segment) |
|
186
|
;; but first, insert padding |
|
187
|
(sb-vm::emit-long-nop segment (- alloc (code-page-alloc page))) |
|
188
|
(funcall emitter)) |
|
189
|
;; tidy up any backreference |
|
190
|
(sb-assem:finalize-segment segment) |
|
191
|
;; then get the (position-independent) machine |
|
192
|
;; code as a vector of bytes |
|
193
|
(sb-assem:segment-contents-as-vector segment))))) |
|
194
|
;; finally, copy each machine code sequence to the right code page |
|
195
|
(map nil (lambda (page bytes) |
|
196
|
(let ((alloc (code-page-alloc page))) |
|
197
|
(replace (code-page-code page) bytes :start1 alloc) |
|
198
|
(assert (<= (+ alloc (length bytes)) (length (code-page-code page)))) |
|
199
|
(setf (code-page-alloc page) (+ alloc (length bytes))))) |
|
200
|
pages bytes) |
|
201
|
;; and return the offset for that code sequence |
|
202
|
alloc)) |
|
203
|
|
|
204
|
(defun emit-all-code (&rest emitters) |
|
205
|
(let ((pages (loop repeat (length *stack*) |
|
206
|
for page = (make-code-page) |
|
207
|
;; prefill everything with one-byte NOPs |
|
208
|
do (fill (code-page-code page) #x90) |
|
209
|
collect page))) |
|
210
|
(values (mapcar (lambda (emitter) |
|
211
|
(emit-code pages emitter)) |
|
212
|
emitters) |
|
213
|
pages))) |
|
214
|
|
|
215
|
(defun next (&optional offset) |
|
216
|
(setf offset (or offset 0)) ; accommodate primops that frob IP |
|
217
|
(let ((rotation (mod *stack-pointer* (length *stack*)))) |
|
218
|
(inst movzx *rax* (make-ea :dword :base *virtual-ip* |
|
219
|
:disp offset)) |
|
220
|
(unless (= -4 offset) |
|
221
|
(inst add *virtual-ip* (+ 4 offset))) |
|
222
|
(if (zerop rotation) |
|
223
|
(inst add *rax* *code-base*) |
|
224
|
(inst lea *rax* (make-ea :qword :base *code-base* |
|
225
|
:index *rax* |
|
226
|
:disp (* rotation *primitive-code-offset*)))) |
|
227
|
(inst jmp *rax*))) |
|
228
|
|
|
229
|
(defun swap () |
|
230
|
(inst xchg (@ 0) (@ 1)) ; exchange top of stack and stack[1] |
|
231
|
(next)) |
|
232
|
|
|
233
|
#+end_src |
|
234
|
|
|
235
|
#+name: instructions |
|
236
|
#+begin_src lisp :package hakmem |
|
237
|
;; todo |
|
238
|
(defun %parse-reg3 (rt ra rb)) |
|
239
|
(defun %parse-reg2i (rt ra i)) |
|
240
|
(defun %parse-reg2ui (rt ra ui)) |
|
241
|
(defmacro def-inst (name args &body body) |
|
242
|
;; todo: compose a function based on regs+args+body |
|
243
|
`(let ((sc *scratch*) |
|
244
|
(r0 +r0+) |
|
245
|
(ra 0) |
|
246
|
(rb 0) |
|
247
|
(rt 0)) |
|
248
|
(declare (ignorable sc r0 ra rb rt)) |
|
249
|
(setf (gethash ',name *instructions*) (lambda ,args (progn ,@body))))) |
|
250
|
|
|
251
|
(defmacro def-prim (name op) |
|
252
|
`(def-inst ,name () (setf rt (,op ra rb)))) |
|
253
|
#+end_src |
|
254
|
|
|
255
|
#+name: prims |
|
256
|
#+begin_src lisp :package hakmem |
|
257
|
(def-prim add +) |
|
258
|
(def-prim sub -) |
|
259
|
(def-prim mul *) |
|
260
|
(def-prim div /) |
|
261
|
;; divu |
|
262
|
(def-prim rem mod) |
|
263
|
;; remu |
|
264
|
(def-prim cmpeq =) |
|
265
|
(def-prim cmpne /=) |
|
266
|
(def-prim cmplt <) |
|
267
|
(def-prim cmple <=) |
|
268
|
(def-prim cmpgt >) |
|
269
|
(def-prim cmpge >=) |
|
270
|
;; ltu leu gtu geu |
|
271
|
(def-inst addi (i) |
|
272
|
(setf rt (+ ra i))) |
|
273
|
(def-inst muli (i) |
|
274
|
(setf rt (* ra i))) |
|
275
|
(def-prim and logand) |
|
276
|
(def-prim or logior) |
|
277
|
(def-prim xor logxor) |
|
278
|
|
|
279
|
(defun get-inst (i) (gethash i *instructions*)) |
|
280
|
|
|
281
|
(defmacro %inst (i &body args) |
|
282
|
`(funcall (get-inst ',i) ,@args)) |
|
283
|
|
|
284
|
(defun list-instructions (&optional (tbl *instructions*)) |
|
285
|
(hash-table-alist tbl)) |
|
286
|
|
|
287
|
#+end_src |
|
288
|
#+name: instruction-list |
|
289
|
#+begin_src lisp :results replace :exports both :package hakmem |
|
290
|
(list-instructions) |
|
291
|
#+end_src |
|
292
|
|
|
293
|
#+RESULTS: instruction-list |
|
294
|
#+begin_example |
|
295
|
((XOR . #<FUNCTION (LAMBDA ()) {10077C774B}>) |
|
296
|
(OR . #<FUNCTION (LAMBDA ()) {10077C777B}>) |
|
297
|
(AND . #<FUNCTION (LAMBDA ()) {10077C77AB}>) |
|
298
|
(MULI . #<FUNCTION (LAMBDA (I)) {10077C77DB}>) |
|
299
|
(ADDI . #<FUNCTION (LAMBDA (I)) {10077C780B}>) |
|
300
|
(CMPGE . #<FUNCTION (LAMBDA ()) {10077C783B}>) |
|
301
|
(CMPGT . #<FUNCTION (LAMBDA ()) {10077C786B}>) |
|
302
|
(CMPLE . #<FUNCTION (LAMBDA ()) {10077C789B}>) |
|
303
|
(CMPLT . #<FUNCTION (LAMBDA ()) {10077C78CB}>) |
|
304
|
(CMPNE . #<FUNCTION (LAMBDA ()) {10077C78FB}>) |
|
305
|
(CMPEQ . #<FUNCTION (LAMBDA ()) {10077C792B}>) |
|
306
|
(REM . #<FUNCTION (LAMBDA ()) {10077C795B}>) |
|
307
|
(DIV . #<FUNCTION (LAMBDA ()) {10077C79AB}>) |
|
308
|
(MUL . #<FUNCTION (LAMBDA ()) {10077C79EB}>) |
|
309
|
(SUB . #<FUNCTION (LAMBDA ()) {10077C7A1B}>) |
|
310
|
(ADD . #<FUNCTION (LAMBDA ()) {10077C7A4B}>)) |
|
311
|
#+end_example |