Adventures in x86 ASM with rx86
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I just finished ‘Code: The Hidden Language of Computer Hardware and Software’ by Charles Petzold which was a really well-written (in my opinion) guided journey from flashing a light in morse code through to building a whole computer, and everything needed along the way.
The section on encoding instructions for the processor (built up from logic gates) – assembly instructions as a human readable version of the machine code – was particularly interesting to me, and as I was describing this to a colleague I remembered that it’s not the first time I’ve played with assembly…
Years and years ago (I don’t recall how it actually started) I spent some time trying to solve a puzzle. I don’t recall whether I saw the puzzle or a solution first, but I do remember wanting to be able to understand it properly, and ideally be able to use some software I wrote to reach the solution.
The puzzle was just a set of characters on a poster for the (then named) Australian Defence Signals Directorate (now the Australian Signals Directorate – one of our Secret Squirrel orgs) at ruxcon in 2011
Yes, that was a long time ago, but I never wrote up what I did, and now seems like a good enough time to get really distracted.
I would be surprised if I understood it well enough at the time, so I suspect I was aware of this blogpost which walks through the solution (spoilers). Nonetheless, I wanted to be able to do that myself, not just follow some instructions – I was confident that I could write enough code (of some sort) to go from this sequence of letters and symbols to the final solution.
My attempts at the time were mostly command-line attempts; the blog post linked above uses only web services, so that felt like I could make it ‘my own’. I first needed to get the characters into my computer – that’s just writing them out to a text file, say, a file called dsd
# dsd: 6AAAAABbi8uDwx4zwDPSigOK ETLCiAM8AHQrg8EBg8MB6+wz /7/z+TEct0SlpGf5dRyl53US YQEE56Ri7Kdkj8IAABkcOsw=
Knowing that this is base-64 encoded, I can decode it with hexdump
cat dsd | base64 -d | hexdump -C 00000000 e8 00 00 00 00 5b 8b cb 83 c3 1e 33 c0 33 d2 8a |.....[.....3.3..| 00000010 03 8a 11 32 c2 88 03 3c 00 74 2b 83 c1 01 83 c3 |...2...<.t+.....| 00000020 01 eb ec 33 ff bf f3 f9 31 1c b7 44 a5 a4 67 f9 |...3....1..D..g.| 00000030 75 1c a5 e7 75 12 61 01 04 e7 a4 62 ec a7 64 8f |u...u.a....b..d.| 00000040 c2 00 00 19 1c 3a cc |.....:.| 00000047
To just get the bytecode, I used some different options and saved the file as dsd.hex
cat dsd | base64 -d | hexdump -v -e '/1 %02X ' > dsd.hex # dsd.hex: E8 00 00 00 00 5B 8B CB 83 C3 1E 33 C0 33 D2 8A 03 8A 11 32 C2 88 03 3C 00 74 2B 83 C1 01 83 C3 01 EB EC 33 FF BF F3 F9 31 1C B7 44 A5 A4 67 F9 75 1C A5 E7 75 12 61 01 04 E7 A4 62 EC A7 64 8F C2 00 00 19 1C 3A CC
I did go a similar route to the linked blogpost and converted these bytes to shellcode, wrapped them in a C program and disassembled it with gdb
, but much simpler was to use a better tool, in this case udis which I needed to install separately. This
gives the same result as the blogpost, which was nice
udcli -x dsd.hex > dsd.hex.asm # dsd.hex.asm: 0000000000000000 e800000000 call 0x5 0000000000000005 5b pop ebx 0000000000000006 8bcb mov ecx, ebx 0000000000000008 83c31e add ebx, 0x1e 000000000000000b 33c0 xor eax, eax 000000000000000d 33d2 xor edx, edx 000000000000000f 8a03 mov al, [ebx] 0000000000000011 8a11 mov dl, [ecx] 0000000000000013 32c2 xor al, dl 0000000000000015 8803 mov [ebx], al 0000000000000017 3c00 cmp al, 0x0 0000000000000019 742b jz 0x46 000000000000001b 83c101 add ecx, 0x1 000000000000001e 83c301 add ebx, 0x1 0000000000000021 ebec jmp 0xf 0000000000000023 33ff xor edi, edi 0000000000000025 bff3f9311c mov edi, 0x1c31f9f3 000000000000002a b744 mov bh, 0x44 000000000000002c a5 movsd 000000000000002d a4 movsb 000000000000002e 67f9 a16 stc 0000000000000030 751c jnz 0x4e 0000000000000032 a5 movsd 0000000000000033 e775 out 0x75, eax 0000000000000035 126101 adc ah, [ecx+0x1] 0000000000000038 04e7 add al, 0xe7 000000000000003a a4 movsb 000000000000003b 62ec invalid 000000000000003d a7 cmpsd 000000000000003e 648fc2 pop edx 0000000000000041 0000 add [eax], al 0000000000000043 191c3a sbb [edx+edi], ebx 0000000000000046 cc int3
At this point, I got a bit lost (at the time) because I didn’t understand assembly well enough (or at all), so, continuing with the logic presented in the linked blogpost, I considered just working with the bytes directly.
All we really need to do it take the bytes starting at 0x5
and 0x23
and xor
them. I figured I’ll need the decimal value of these addresses; 0x5
is just 5, but 0x23
= 16*2 + 3 = 35. We can of course get this via printf
printf "%d\n" 0x23 ## 35
or less simply with the built-in calculator tool bc
, going from (input) base 16 to (output) base 10
echo "obase=10;ibase=16; 23" | bc ## 35
I placed the bytes in sequence (removing spaces) with
str=$(cat dsd.hex | sed 's/ //g') echo $str ## E8000000005B8BCB83C31E33C033D28A038A1132C288033C00742B83C10183C301EBEC33FFBFF3F9311CB744A5A467F9751CA5E77512610104E7A462ECA7648FC20000191C3ACC
Since I have 2 characters per hex, 0x5
starts at character 10, and 0x23
starts at character 70, so we define our strings as
astr=${str:70} # 0x23 to end echo $astr ## 33FFBFF3F9311CB744A5A467F9751CA5E77512610104E7A462ECA7648FC20000191C3ACC
and
$ bstr=${str:10} # 0x5 to end $ echo $bstr ## 5B8BCB83C31E33C033D28A038A1132C288033C00742B83C10183C301EBEC33FFBFF3F9311CB744A5A467F9751CA5E77512610104E7A462ECA7648FC20000191C3ACC
There is overlap here, which we will have to deal with when we get to it. For now, we want to xor
these. Let’s cut these down to 60 characters (where they start to overlap)
trimastr=${astr:0:60} echo $trimastr ## 33FFBFF3F9311CB744A5A467F9751CA5E77512610104E7A462ECA7648FC2 trimbstr=${bstr:0:60} echo $trimbstr ## 5B8BCB83C31E33C033D28A038A1132C288033C00742B83C10183C301EBEC
The command xor
(^
) chokes on this many digits (in fact, any more than about 8) so I’ve written a script to xor
the characters one at a time:
for i in {0..59} ; do echo $(( 0x${astr:$i:1} ^ 0x${bstr:$i:1} )) | awk '{printf "%X",$0}' ; done > xor.dat # xor.dat: 687474703A2F2F7777772E6473642E676F762E61752F6465636F6465642E
What about that bit that overlapped? we only xored the first 60 characters, but the length of the string is
echo ${#astr} ## 72
so we need those first 12 (72-12) characters (6 hex) of overlap
olap1=${astr:60:12} echo $olap1 ## 0000191C3ACC
The assembly code would have overwritten the overlapped part by the time it reached there, so we need to xor
with the xor
’d part, i.e. the first 12 characters of xor.dat
olap2=$(cat xor.dat | cut -c -12) echo $olap2 ## 687474703A2F
Now, finally, do the last xor
(^
)
for i in {0..11} ; do echo $(( 0x${olap1:$i:1} ^ 0x${olap2:$i:1} )) | awk '{printf "%X",$0}' ; done > xorolap.dat # xorolap.dat: 68746D6C00E3
so this needs to go at the end of our xor.dat
cat xor.dat xorolap.dat > xorfinal.dat # xorfinal.dat: 687474703A2F2F7777772E6473642E676F762E61752F6465636F6465642E68746D6C00E3
The final 00
and after is useless, so let’s drop it. Finally, we just need to convert this back to text using xxd -r
cat xorfinal.dat | sed 's/[0]\{2\}.*//' | xxd -r -p > dsd.sol
Phew. I’m not going to reveal the solution just yet, because this isn’t the end of the story (but I did get the right answer).
So, that’s a commandline solution to (at least this part) of the puzzle. But now I know R!
Learning more about assembly from the book ‘Code’, it occurred to me that the
operations - which could be implemented with something as simple as telegraph relays (or crabs) - were just operations on data. Given an input, produce an output (sort of). A MOV
operation just moved some value stored at some address to another address (or to/from a register). This felt like it could be simple enough to encode in some R functions. Perhaps not some “pure” R functions, because I want the side-effect of altering a global memory bank, but surely I could do simple things like ADD
.
I looked around to see if someone else had done this before. As is usually the case with odd requests, Mike (a.k.a. @coolbutuseless) has done something similar in the form of r64 which I didn’t appreciate was a sufficiently distinct flavour of assembly (I never had a Commodore64, we had an Amstrad CPC664 on which I really only played games). After a quick PR to bring that repo up to date with other changes by the author (a migration of one dependency) I realised this wasn’t what I needed, but did learn a lot from how it was structured.
Okay, on to building something myself. I knew I’d need some memory and some registers. The registers seemed easy - they wouldn’t hold a lot and I could address them by name, e.g. eax
. An environment seemed natural, both because of the named list structure, and because I knew it would be mutable. That seemed like a benefit for this use-case - having a global set of registers I could move data in and out of without making copies of the thing or passing it around everywhere.
Next I’d need memory. I figured a vector of hex value made sense, but I wanted to be able to refer to the first one as 0x00
. Now, the names of a vector need to be a character vector - you can’t use the actual hex values
memory <- c(0x00 = 0x19, 0x01 = 0x1a, 0x02 = 0x1b) ## Error: <text>:1:18: unexpected '=' ## 1: memory <- c(0x00 = ## ^
so we need to use character strings
memory <- c("0x00" = 0x19, "0x01" = 0x1a, "0x02" = 0x1b) memory ## 0x00 0x01 0x02 ## 25 26 27
More importantly, we’ll need to ensure we only refer to these by the character
strings because [
first tries a coercion to integer, which, side-note, is why this works
(1:10)[2.3] # since as.integer(2.3) == 2 ## [1] 2 (1:10)[4.7] # since as.integer(4.7) == 4 ## [1] 4
The risk is that we use a hex value to extract an element, in which case we might accidentally try to get the first value with
memory[0x00] ## named numeric(0)
Instead, we want
memory["0x00"] ## 0x00 ## 25
In order to make sure we always do this, we need a sanitize()
function which
always returns the string.
We can convert a value to hexmode with as.hexmode
, but that’s a lot of typing, so I added an alias as
hex <- as.hexmode
For processing assembly instructions, we might see something like
mov eax, 0x5
which should move the value 0x5
into the register eax
… so we’ll need a way
to distinguish direct addresses from registers. Worse still, we might refer to the address stored in a register, as [eax]
. A reg_or_val()
function would identify anything which points to an address (containing a [
), any of the named registers, or a value, and would return the address (or where that points).
With all of those pieces, the only thing left is to actually be able to run code.
Assembly runs sequentially through the instructions, unless we encounter some flow
control opcodes (e.g. JMP
- jump to address - I’ll keep calling them opcodes but mnemonics is a more correct term). The basic process would then be to
read in the instruction, identify the opcode and the arguments, and execute, modifying the memory and registers in-place. Once that’s done, we move to the next instruction.
With flow control we might need to identify a different address to go to next, and that might depend on the status of the registers, for example JNZ 0x00
jumps to address 0x00
if the zero flag is not set. So, we can execute the current instruction but then apply any flag-based logic to see if we need to go to a new address, and go wherever we should go go next. This is implemented as runasm()
That takes care of running the code, but what are we running? Oh, operations. Right. Well, we need some of those. Going through the opcodes I need for the puzzle, I’ll need the following:
call pop mov add xor cmp jz jmp halt
CALL
just pushes a value onto the stack (register esp
), POP
retrieves it and
stores it at an address, MOV
as we said moves a value from place to place, ADD
adds two values, XOR
does what it suggests, and so on. These don’t seem tricky to implement, for example
mov <- function(x, y) { # copy y into x res <- hex(reg_or_val(y)) if (x %in% names(registers)) { assign(x, res, envir = registers) } else { mem[sanitize(x)] <<- sanitize(res) } return(invisible(NULL)) }
The wrinkle will be that particular instructions also update registers, for
example an ADD
stores whether the result was 0x00
in the zero-flag register
function(x, y) { # add y to x and save in x res <- hex(reg_or_val(x)) + hex(reg_or_val(y)) if (x %in% names(registers)) { assign(x, res, envir = registers) } else { mem[sanitize(x)] <<- sanitize(res) } assign("zf", hex(as.integer(res == 0x00)), envir = registers) return(invisible(x)) }
A JMP
(or other jump) will check this register and jump (or not) accordingly.
With these pieces in place, an R package was a natural home for the code, so I can now present the {rx86}
package: https://github.com/jonocarroll/rx86
Let’s use it to solve the puzzle!!!
Starting with the puzzle string
dsd <- "6AAAAABbi8uDwx4zwDPSigOK ETLCiAM8AHQrg8EBg8MB6+wz /7/z+TEct0SlpGf5dRyl53US YQEE56Ri7Kdkj8IAABkcOsw="
we decode it (this time in R)
(b64 <- base64enc::base64decode(dsd)) ## [1] e8 00 00 00 00 5b 8b cb 83 c3 1e 33 c0 33 d2 8a 03 8a 11 32 c2 88 03 3c 00 ## [26] 74 2b 83 c1 01 83 c3 01 eb ec 33 ff bf f3 f9 31 1c b7 44 a5 a4 67 f9 75 1c ## [51] a5 e7 75 12 61 01 04 e7 a4 62 ec a7 64 8f c2 00 00 19 1c 3a cc
This will be the only non-R part: we still need to disassemble the bytecode into assembly, but we can do that from R with a system()
call to udcli
(disas <- system("udcli -x", input = paste(b64, collapse = " "), intern = TRUE)) ## [1] "0000000000000000 e800000000 call 0x5 " ## [2] "0000000000000005 5b pop ebx " ## [3] "0000000000000006 8bcb mov ecx, ebx " ## [4] "0000000000000008 83c31e add ebx, 0x1e " ## [5] "000000000000000b 33c0 xor eax, eax " ## [6] "000000000000000d 33d2 xor edx, edx " ## [7] "000000000000000f 8a03 mov al, [ebx] " ## [8] "0000000000000011 8a11 mov dl, [ecx] " ## [9] "0000000000000013 32c2 xor al, dl " ## [10] "0000000000000015 8803 mov [ebx], al " ## [11] "0000000000000017 3c00 cmp al, 0x0 " ## [12] "0000000000000019 742b jz 0x46 " ## [13] "000000000000001b 83c101 add ecx, 0x1 " ## [14] "000000000000001e 83c301 add ebx, 0x1 " ## [15] "0000000000000021 ebec jmp 0xf " ## [16] "0000000000000023 33ff xor edi, edi " ## [17] "0000000000000025 bff3f9311c mov edi, 0x1c31f9f3 " ## [18] "000000000000002a b744 mov bh, 0x44 " ## [19] "000000000000002c a5 movsd " ## [20] "000000000000002d a4 movsb " ## [21] "000000000000002e 67f9 a16 stc " ## [22] "0000000000000030 751c jnz 0x4e " ## [23] "0000000000000032 a5 movsd " ## [24] "0000000000000033 e775 out 0x75, eax " ## [25] "0000000000000035 126101 adc ah, [ecx+0x1] " ## [26] "0000000000000038 04e7 add al, 0xe7 " ## [27] "000000000000003a a4 movsb " ## [28] "000000000000003b 62ec invalid " ## [29] "000000000000003d a7 cmpsd " ## [30] "000000000000003e 648fc2 pop edx " ## [31] "0000000000000041 0000 add [eax], al " ## [32] "0000000000000043 191c3a sbb [edx+edi], ebx " ## [33] "0000000000000046 cc int3 "
We then read this back into R as a data.frame
asm <- suppressWarnings( readr::read_fwf(paste(disas, collapse = "\n"), col_types = "ccc", col_positions = readr::fwf_widths(c(16, 16, 21))) ) colnames(asm) <- c("addr", "bytecode", "instr") # trim the leading 0s from addr since this is all we're using asm$addr <- substr(asm$addr, nchar(asm$addr)-1, nchar(asm$addr)) asm ## # A tibble: 33 x 3 ## addr bytecode instr ## <chr> <chr> <chr> ## 1 00 e800000000 call 0x5 ## 2 05 5b pop ebx ## 3 06 8bcb mov ecx, ebx ## 4 08 83c31e add ebx, 0x1e ## 5 0b 33c0 xor eax, eax ## 6 0d 33d2 xor edx, edx ## 7 0f 8a03 mov al, [ebx] ## 8 11 8a11 mov dl, [ecx] ## 9 13 32c2 xor al, dl ## 10 15 8803 mov [ebx], al ## # … with 23 more rows
The last instruction, int3
, is an interrupt, but let’s generalise it to a halt
because we’ll be done
asm[33, "instr"] <- "halt"
We can run this with {rx86}
… we need a memory array and some registers
mem <- create_mem() registers <- create_reg()
Then we can run the code
runasm(asm)
As we saw earlier, the ‘code’ part of the asm is stored in 0x00
to 0x21
with the remaining addresses being used for temporary storage, from 0x23
. The operations encoded
perform an XOR
between the values stored at 0x05
through to 0x21
with those starting at 0x23
, storing the results starting at 0x23
.
Extracting the memory from this offset onward (up to where it zeroes) results in
(mem_offset <- mem[which(names(mem) == "0x23"):length(mem)]) ## 0x23 0x24 0x25 0x26 0x27 0x28 0x29 0x2a 0x2b 0x2c 0x2d ## "0x68" "0x74" "0x74" "0x70" "0x3a" "0x2f" "0x2f" "0x77" "0x77" "0x77" "0x2e" ## 0x2e 0x2f 0x30 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 ## "0x64" "0x73" "0x64" "0x2e" "0x67" "0x6f" "0x76" "0x2e" "0x61" "0x75" "0x2f" ## 0x39 0x3a 0x3b 0x3c 0x3d 0x3e 0x3f 0x40 0x41 0x42 0x43 ## "0x64" "0x65" "0x63" "0x6f" "0x64" "0x65" "0x64" "0x2e" "0x68" "0x74" "0x6d" ## 0x44 0x45 0x46 0x47 0x48 0x49 0x4a 0x4b 0x4c 0x4d 0x4e ## "0x6c" "0x00" "0xcc" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" ## 0x4f 0x50 0x51 0x52 0x53 0x54 0x55 0x56 0x57 0x58 0x59 ## "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" ## 0x5a 0x5b 0x5c 0x5d 0x5e 0x5f 0x60 0x61 0x62 0x63 0x64 ## "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" ## 0x65 0x66 0x67 0x68 0x69 0x6a 0x6b 0x6c 0x6d 0x6e 0x6f ## "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" ## 0x70 0x71 0x72 0x73 0x74 0x75 0x76 0x77 0x78 0x79 0x7a ## "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" "0x00" ## 0x7b 0x7c 0x7d 0x7e 0x7f ## "0x00" "0x00" "0x00" "0x00" "0x00"
And then lastly, we need to convert this sequence of hex values into characters. I’ve added a helper which achieves this, dropping everything after the first null-terminating byte (0x00
) then
hex2string(mem_offset) ## [1] "http://www.dsd.gov.au/decoded.html"
TADA!
This example is stored along with the package as a vignette, so
vignette("dsd_ruxcon_challenge", package = "rx86")
The link in this solution just redirects to the ASD frontpage since the puzzle is now over 10 years old, but when it was active it led to a page with some binary
0100 0011 0100 1111 0100 1110 0100 0111 0101 0010 0100 0001 0101 0100 0101 0101 0100 1100 0100 0001 0101 0100 0100 1001 0100 1111 0100 1110 0101 0011
Originally, I solved this part at the command line by storing this code in a file named decoded
and running a similar bc
conversion to before, but this time from binary (ibase=2) to hex (obase=16), storing the result in decoded.hex
for bin in $(cat decoded) ; do echo "obase=16;ibase=2; $bin" | bc >> decoded.hex ; done
From there, removing the line breaks and spaces, and passing through xxd
in reverse (similar to hexdump
but reverse works on my machine)
cat decoded.hex | tr '\n' ' ' | sed 's/ //g' | xxd -r -p
Again, I’ll hold off showing the answer, but it was correct.
It would be satisfying to also do this in R, so I added another helper in {rx86}
that does this conversion - it’s not terribly complex, but involves splitting a string into pairs of strings (split at some point) and a conversion
split_pairs <- function(x, split = "") { sst <- strsplit(x, split)[[1]] out <- paste0(sst[c(TRUE, FALSE)], sst[c(FALSE, TRUE)]) paste0(out, collapse = ",") } bin2ascii <- function(bin) { nolb <- gsub("\n", " ", bin) split <- strsplit(split_pairs(paste(nolb, collapse = " "), split = " "), ",")[[1]] ints <- strtoi(split, base = 2) intToUtf8(ints) }
It appears to do the job
binary <- "0100 0011 0100 1111 0100 1110 0100 0111 0101 0010 0100 0001 0101 0100 0101 0101 0100 1100 0100 0001 0101 0100 0100 1001 0100 1111 0100 1110 0101 0011" bin2ascii(binary) ## [1] "CONGRATULATIONS"
There, an (almost) entirely R solution to the puzzle, and all it took was writing my own x86 assembly parser.
I did want to see if I’d made my parser too specific and it only worked with this one example around which I’d designed it, so I wanted to add another example. This
journey started with the book ‘Code’, so it felt fitting to use an example from there. In the book, an example of multiplying two 8-bit numbers is used, which
involved an ADC
(add with carry) operation to handle overflow. This seemed like a good candidate, and I started coding it, but soon realised that the exact routine
relied on an add al, 0xff
which has the effect of adding -1 in 8-bit, but on my machine
as.integer(0xff) ## [1] 255
and
as.hexmode(-1) ## [1] "ffffffff"
which isn’t compatible. I could instead code a SUB
opcode and sub al, 0x01
(which I did) but at this point I decided to abandon the 8-bit idea and simplify down to just doing essential part of the program which multiplies 127 and 28 through repeated additions (via a loop). The asm for this is also stored in the package, and executing it is as simple as
mult_asm <- suppressWarnings( readr::read_fwf(system.file("asm", "mult.asm", package = "rx86"), col_types = "ccc", col_positions = readr::fwf_widths(c(3, 6, 20))) ) colnames(mult_asm) <- c("addr", "bytecode", "instr") print(mult_asm) ## # A tibble: 14 x 3 ## addr bytecode instr ## <chr> <chr> <chr> ## 1 00 101005 mov al, [0x22] ## 2 03 201001 add al, [0x20] ## 3 06 111005 mov [0x22], al ## 4 09 101004 mov al, [0x21] ## 5 0c 221000 sub al, 0x01 ## 6 0f 111004 mov 0x21, al ## 7 12 101003 jnz 0x00 ## 8 15 20001e halt ## 9 18 111003 invalid ## 10 1b 330000 invalid ## 11 1e ff00 invalid ## 12 20 a7 data 167 ## 13 21 1c data 28 ## 14 22 00 result
Again, we need a new memory array and some registers
mem <- create_mem(len = 64) registers <- create_reg()
Then run the code
runasm(mult_asm)
The final result can be extracted but it is still a hex value
mem[sanitize(0x22)] ## 0x22 ## "0x1244"
Converting it to an integer gives the expected result
as.integer(mem[sanitize(0x22)]) ## [1] 4676 167*28 ## [1] 4676
This, too, is stored as a vignette in the package, and can be found with
vignette("mult_code_petzold", package = "rx86")
The package is far from perfect, and only supports what I needed it to, but I’ve learned a lot about assembly and got to build something I’ve always wanted to. Plus I’ve finally written up my process for this puzzle that has been sitting on a disused laptop for a decade.
That’s not quite the end, though - I really wanted to test out what I’ve learned so far, and what good is a new programming ability without a “Hello, world!” example?
Almost all of the examples I found floating around use ‘modern’ asm (without the bytecode) and allow such luxuries as “storing a string” and “system calls” - none of that here, thank you. Instead, I added a new opcode mnemonic int 0x80
which sort of does what it should - it writes to screen the value (converted to character) of whatever is in the register eax
. That’s helpful, but I still need
the assembly that will use that. This is where I feel I’ve actually hand-programmed something myself. This is a piece of code that could literally have been punched into a card
The whole thing works, of course
hello_asm <- suppressWarnings( readr::read_fwf(system.file("asm", "helloworld.asm", package = "rx86"), col_types = "ccc", col_positions = readr::fwf_widths(c(3, 3, 20))) ) colnames(hello_asm) <- c("addr", "bytecode", "instr") print(hello_asm) ## # A tibble: 23 x 3 ## addr bytecode instr ## <chr> <chr> <chr> ## 1 00 10 mov ecx, 0x0e ## 2 01 10 mov al, 0x08 ## 3 02 10 mov eax, [al] ## 4 03 cc int 0x80 ## 5 04 28 sub ecx, 0x01 ## 6 05 70 jz 0x17 ## 7 06 05 add al, 0x01 ## 8 07 e9 jmp 0x02 ## 9 08 48 data ## 10 09 65 data ## # … with 13 more rows mem <- create_mem() registers <- create_reg() runasm(hello_asm) ## Hello, world!
and I find that honestly, ridiculously pleasing.
This, too, is included in the package as
vignette("helloworld", package = "rx86")
I’m satisfied that {rx86}
works, at least in some sense.
I’ve learned a lot along the way, and who knows, maybe I’ll add some more opcodes to the package. If you have some suggestions, please let me know!
devtools::session_info()
## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.3 (2020-10-10) ## os Pop!_OS 20.10 ## system x86_64, linux-gnu ## ui X11 ## language en_AU:en ## collate en_AU.UTF-8 ## ctype en_AU.UTF-8 ## tz Australia/Adelaide ## date 2021-12-23 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.0.3) ## blogdown 1.7 2021-12-19 [1] CRAN (R 4.0.3) ## bookdown 0.24 2021-09-02 [1] CRAN (R 4.0.3) ## bslib 0.3.1 2021-10-06 [1] CRAN (R 4.0.3) ## callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3) ## cli 3.1.0 2021-10-27 [1] CRAN (R 4.0.3) ## crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.3) ## desc 1.4.0 2021-09-28 [1] CRAN (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.3) ## digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) ## dplyr 1.0.2 2020-08-18 [1] CRAN (R 4.0.3) ## ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.3) ## evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.3) ## fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.3) ## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.3) ## fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.3) ## generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.3) ## hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.3) ## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.3) ## jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3) ## knitr 1.37 2021-12-16 [1] CRAN (R 4.0.3) ## lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.0.3) ## magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) ## memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.3) ## pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.3) ## pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.3) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.3) ## processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.3) ## ps 1.5.0 2020-12-05 [1] CRAN (R 4.0.3) ## purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.3) ## R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3) ## readr 1.4.0 2020-10-05 [1] CRAN (R 4.0.3) ## remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.3) ## rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3) ## rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.0.3) ## rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3) ## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.3) ## rx86 * 0.1.0 2021-12-22 [1] local ## sass 0.4.0 2021-05-12 [1] CRAN (R 4.0.3) ## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.3) ## testthat 3.0.1 2020-12-17 [1] CRAN (R 4.0.3) ## tibble 3.0.4 2020-10-12 [1] CRAN (R 4.0.3) ## tidyr 1.1.2 2020-08-27 [1] CRAN (R 4.0.3) ## tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.3) ## usethis 2.1.5 2021-12-09 [1] CRAN (R 4.0.3) ## utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.3) ## vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.3) ## withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.3) ## xfun 0.29 2021-12-14 [1] CRAN (R 4.0.3) ## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.3) ## ## [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.0 ## [2] /usr/local/lib/R/site-library ## [3] /usr/lib/R/site-library ## [4] /usr/lib/R/library
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.