I Made a CPU for a CTF Challenge (ft. my blood, sweat, and tears)


      
So I was doing my internship at Trend Micro, and during my free time I was looking around at CTF challenges for ideas cause I was also a CTF Dev for ISSessions CTF 2026. One day, I asked a colleague of mine if he had any cool ideas and he showed me an article about a DEFCON challenge that had participants reversing a completely custom instruction set. I can't for the life of me find the link or even remember the challenge name (believe me I've tried finding it since), but the concept stuck with me like glue. A fully custom ISA, no documentation, just raw bytes and vibes, a true nightmare for the players ^w^.



At the time, the usual "here's a stripped ELF/PE, figure it out" style reversing challenge was getting a bit stale for me. I wanted something that would genuinely make people stop and think ~~and cry tears of blood~~. Something where they couldn't just throw it into Ghidra/Cutter (or IDA for some of you weirdos) or dump the memory in x64dbg/gdb and call it a day.

So naturally, I decided to design an entire CPU architecture from scratch. How hard could it be?

## The "How Hard Can It Be" Phase

Famous last words.

So I start sketching things out, taking heavy inspiration from Intel's x86 because that's what I was most familiar with, and then I started twisting things around. The fun part was sitting there figuring out what combination of bit lengths would be the funniest (for me) and the most painful (for participants). Like genuinely just going through different combinations and going "okay but what if the instruction was THIS wide and the registers were THAT wide, would that be annoying enough?"

I wanted fixed-width instructions because variable-length encoding would have been a nightmare to implement in code and an even bigger nightmare for someone trying to reverse without docs (~~its a true tragedy I didn't do that, it'd have been sooo funny~~ I just couldn't bear such injustice, I'm so merciful yk). But what width? 32 bits felt too standard, anyone who's done any reversing would recognise it. 64 felt excessive for what I needed. I needed something that would look wrong.

I landed on 42 bits.

Why 42? Well, its the answer to life, the universe, and everything, and also because it made the math work out in a way that was just annoying enough. Not a clean power of two, which means anyone trying to figure out the instruction boundaries in a hex dump is going to have a fun time.

![scribbles](https://blogger.googleusercontent.com/img/a/AVvXsEi8bYBJT4Qr12kU1WglcS7oFXk8kv1GVGda-48EXu9jFMvnBAUQpG9_vPOjYIfbUDfy6qKFDFNq-T_p-K0BWhj0HzaMw5BcK21TBbBsINmjyjy7zcxpp46yTkpDIHP6jSMyePuKAPTB-_rb4vbzvt3qNu5SRxk7l8bdoBPSyfP5uk_Hif-IhI12WKXmXIfi)
*My pristine initial design plan is totally not a bunch of scribbles I had made at midnight*

For the data width I went with 24-bit signed registers. Three bytes per word. Not a power of two in the traditional sense for data widths (I mean technically it is if we take decimal powers but like, not the usual sane ones), which means anyone trying to pattern-match against known architectures would hit a wall immediately. The immediate field is also 24 bits, which gives a signed range of -8,388,608 to 8,388,607. Plenty of room for addresses within the code segment and for reasonable constants.

I was pretty happy with these numbers. They felt sufficiently cursed.


## Building the Machine

This is where the real suffering began, and I don't say that lightly cause I've written Windows shellcode from scratch before (you can read about that particular adventure [here](https://kernelmode0x0.blogspot.com/2024/08/windows-shellcoding.html) and [here](https://kernelmode0x0.blogspot.com/2024/09/windows-shellcoding-3-tcp-reverse-shell.html)) and even that didn't prepare me for the rabbit hole I was about to fall into.

### The Register Situation

So I start with the register file. Initially I had planned for 6 general-purpose registers because I wanted the ISA to feel somewhat capable. More registers = more flexibility for whoever is programming in it, right? Makes sense on paper.

But here's the thing - more registers means more bits to encode them in the instruction. With 6 registers I'd need 3 bits per register field, and with three register fields per instruction that's 9 bits just for registers alone. That was eating into my immediate field which I had planned to be 24 bits (42-bit Instr/24-bit data) but was considering scaling it down but it still needed to be atleast 16 bits to be useful for addresses and constants.

So I went down to 4 registers. But 4 had its own set of headaches. See, the good thing was 4 registers needs 2 bits to encode, but then I had a problem - I was going to have to send all registers with every instruction regardless of whether the instruction uses them or not. The encoding logic was getting messy and I was spending more time on edge cases than actual functionality.

I sat there staring at my notebook for a while and eventually had a moment of enlightenment: 3 general-purpose registers (`RA`, `RB`, `RC`) encoded in 2-bit fields.

Now here's the neat part - with 2-bit encoding, `00` could mean "no register in this field." So I had four states: `00` = unused, `01` = `RA`, `10` = `RB`, `11` = `RC`. This was elegant in theory. Three fields × 2 bits = 6 bits for registers, leaving me with a comfortable 24-bit immediate.

But then I did the math.

8 bits (opcode) + 6 bits (registers) + 24 bits (immediate) = 38 bits. Not 42.

I needed 4 more bits. This is where I actually used the wonderful concept of *reserved bits*. You might've seen them in x86 documentation before and well they have it for like when they expand stuff or add an extension. But for me, they exist because sometimes your bit math doesn't work out and you're too stubborn to give up on your 42 bit enconding so you add padding. (as a bonus that was another way to ~~fuck with~~ make it fun for players XD)

I added two 2-bit reserved sections that must always be zero, one between the opcode and registers, one between the registers and the immediate. This padded the instruction to exactly 42 bits and kept everything aligned nicely when stored in 6-byte little-endian slots:

```
Bits 41..34 : Opcode    (8 bits)
Bits 33..32 : Reserved  (2 bits, must be 0)
Bits 31..30 : Reg1      (2 bits)
Bits 29..28 : Reg2      (2 bits)
Bits 27..26 : Reg3      (2 bits)
Bits 25..24 : Reserved  (2 bits, must be 0)
Bits 23..0  : Immediate (24-bit signed)
```

I was actually quite proud of this. The reserved bits also serve a reversing purpose - if someone figures out they must be zero, they can use that as a validation check to confirm they've got the instruction boundaries right.

![color-coded](https://blogger.googleusercontent.com/img/a/AVvXsEiMgV9MvVtJO0-DZLNvVITpdbG-h0cbnnepOn7bvItuqRXXE6OrKQ3lPsXcK7ndLpiwoZK9Lai5kGfCXUxwn6fUNZB-0iWOENdB8Wvk8xQ2-zWhLgPkpHJPYK5ATfkeQGWxxr9lN1Fuz0J9HK5-J0wiUbwhXhhMmGyUqqyuCi0ryl3KCVtBMjJOL_JNz6y1)

### The Memory Layout (Some questionable choices were made here)

Now for the memory layout, I made an interesting decision which came to byte me later (heh, god that pun was awful).

The address space is 64-bit - the program counter and stack pointer are both 64-bit values. I set up three memory regions:

- Code segment at the bottom: 0x000000 to 0x0FFFFF (1 MB)
- Data segment above that: 0x100000 to 0x1FFFFF (1 MB)
- Stack way up at the top of the 64-bit space: 0xFFFFFFFFFFF00000 upward, growing downward

I figured players were never going to be directly seeing or using the 64-bit addresses - they'd interact with the ISA through the 24-bit registers and immediates. So having this massive address space wouldn't matter cause they'd never need to touch it directly.

Right?

(Narrator: It would matter. A lot.)

### The Fetch-Decode-Execute Loop

So the VM itself is a pretty standard interpreter loop in code. Fetch 6 bytes at PC, mash them together into a 48-bit integer (little-endian), ignore the top 6 bits, decode the fields, execute, advance PC by 6 (unless a jump or call was taken).

The decode function ended up looking like this:

```python
def decodeInstr(self, instr: int) -> tuple:
    op = (instr >> 34) & 0xFF
    rsv = (instr >> 32) & 0x03
    r0 = ((instr >> 30) & 0x03) - 1
    r1 = ((instr >> 28) & 0x03) - 1
    r2 = ((instr >> 26) & 0x03) - 1
    rsv2 = (instr >> 24) & 0x03
    imm = instr & 0xFFFFFF
    return op, rsv, r0, r1, r2, rsv2, imm
```

The `-1` on the register fields is a little trick - it maps `00` (no register) to `-1` which I use as a sentinel value I called `NOREG`. When an instruction requires a register and gets `-1`, the VM throws a runtime error via the `chkReg` function. Simple, effective, and only slightly cursed.

The reserved bits are checked at the start of `execInstr` - if either isn't zero, its a runtime error. This means any garbled or malformed instruction will immediately halt execution, which is actually quite helpful for debugging.

Now the execution itself is a big `match/case` block, like a civilised piece of software, none of the `if`-`elif` barabarism here. Each case corresponds to an opcode, and the logic for that opcode is contained within its case block. This keeps things organised and makes it easier to read (relatively speaking, it's still a nightmare of course, may god have mercy on the brave souls who actually reverse this VM XD).

```python
 def execInstr(
        self, op: int, rsv: int, r0: int, r1: int, r2: int, rsv2: int, imm: int
        ) -> None:
        # Execute a single instruction
        if rsv != 0:
            raise RuntimeError(f"Reserved bits (33..32) must be zero (got {rsv})")
        if rsv2 != 0:
            raise RuntimeError(f"Reserved bits (25..24) must be zero (got {rsv2})")

        imsgn = self.sgnExt24(imm)

        match op:
            case 0x00:  # HALT
                self.halted = True
            case 0x01:  # MOV
                self.chkReg(r0)
                self.regs[r0] = self.to24(imsgn)
            case 0x02:  # MOVR
                self.chkReg(r0)
                self.chkReg(r1)
                self.regs[r0] = self.regs[r1]
			... [truncated] ...
            case 0x1E:  # NEG
                self.chkReg(r0)
                self.regs[r0] = self.to24(-self.regs[r0])
            case 0x1F:  # SYSCALL
                self.chkReg(r0)
                if r0 != self.RA:
                    raise RuntimeError("SYSCALL Reg1 must be RA")
                if r1 != self.NOREG and r1 != self.RB:
                    raise RuntimeError("SYSCALL Reg2 must be RB")
                if r2 != self.NOREG and r2 != self.RC:
                    raise RuntimeError("SYSCALL Reg3 must be RC")
                res = self.handleSyscall(r1, r2)
                self.regs[r0] = self.to24(res)
            case _:
                raise RuntimeError(f"Unknown opcode: 0x{op:02X}")

        self.pc += self.INSTRSZ
```

### Designing the Opcode/SYSCALLs

I ended up with 39 opcodes total (initially there were only 31 but I had missed some crucial details). Designing these was probably the most fun part of the whole project cause I basically got to play ISA designer for real.

The usual suspects are all there - arithmetic (`ADD`, `SUB`, `MUL`, `DIV`, `MOD`), bitwise ops (`AND`, `OR`, `XOR`, `NOT`, `SHL`, `SHR`), memory access (`LOAD`, `STORE`, `LOADI`, `STOREI`), and control flow (`JMP`, `JEQ`, `JNE`, `JLT`, `JGT`, `JLE`, `JGE`). The ones with 'I' were made for immediate values as I decided to have yet another mercy on the players and differentiate between operations instead of using the same opcode for multiple formats (totally not because I was too lazy to implement it on my end). 

Then there's the missed detail: stack operations, in a twist of ridiculous irony, I initially didn't have them in the spec. I ended up making them later when I worked on the second challenge, where they didn't end up being used, and were just show pieces until I updated the `journey.asm` with them. I went with `PUSH`, `POP`, `CALL`, and `RET` for the basics, thought that was too basic and ended up adding the following too: `PUSHI` (push an immediate value), `PUSHA` (push all three registers at once), and `POPA` (pop all three registers). The PUSHA/POPA pair was something I added purely out of desperation later when I realised that saving and restoring registers with only 3 GPRs was absolute agony. Each push/pop writes/reads 8 bytes because the stack is 64-bit (yes even though registers are 24-bit, the stack entries are 8 bytes each, don't ask, it was easier not to think about it).

I also added convenience instructions like `MZERO` ('make zero' to well make a register value be zero, ik I'm such a genius), `INC`, `DEC`, `NEG`, and `MOVR` (register-to-register copy). These could all be done with the existing arithmetic instructions but having single-instruction versions makes the assembly much more readable when you're staring at it for hours.

And then there's `SYSCALL` - the bridge between the VM and the outside world. This was the big one.

I designed 9 syscall types initially:

| ID  | Name      | What it does                                                |
| --- | --------- | ----------------------------------------------------------- |
| 0   | EXIT      | Terminate the program with an exit code                     |
| 1   | PRINT_INT | Print a register value as a decimal integer                 |
| 2   | PRINT_STR | Print a string from memory (null-terminated or with length) |
| 3   | READ_INT  | Read an integer from input                                  |
| 4   | READ_STR  | Read a string into a buffer                                 |
| 5   | STRLEN    | Get string length                                           |
| 6   | STRCMP    | Compare two strings                                         |
| 7   | PRINT_HEX | Print a register value as hex                               |
| 8   | RANDOM    | Generate a random 24-bit value                              |

These covered everything I needed for the RPG - printing, reading input, string comparisons, random values for the combat system. Two more syscalls would be added later when I started building the PWN challenge, but that's a story for Part 2.

The `SYSCALL` encoding was tricky. The syscall number goes in `RA` (Reg1 field), and the two arguments go in `RB` and `RC` (Reg2 and Reg3 fields). After the syscall executes, the return value overwrites `RA`. This means every syscall clobbers your first register, which is another lovely limitation of having only 3 registers. And I had specified that the arguments must be in specifc order and only in specified registers, like in the `PRINT_STR` syscall, `RB` must have the string address and `RC` must have the length. So having this validation was another hour of debugging until that worked properly (Can't be merciful all the time).

### The Sign Extension Dance

One thing I want to talk about because it tripped me up repeatedly is the sign extension. When you have a 24-bit signed value and you need to use it as an address or do arithmetic, you need to properly sign-extend it. A 24-bit value of `0xFFFFFF` is -1 in signed, not 16,777,215.

I wrote two helper functions for this:

```python
def sgnExt24(self, val: int) -> int:
    if val & 0x800000:
        return val | (-1 << 24)
    return val & 0xFFFFFF

def to24(self, val: int) -> int:
    val = val & self.MASK24
    if val >= (1 << 23):
        val -= 1 << 24
    return val
```

`sgnExt24` takes a raw 24-bit value from the immediate field and sign-extends it to a full code int. `to24` takes any code int and wraps it back into 24-bit signed range. Every arithmetic result goes through `to24`. Every immediate goes through `sgnExt24`.

I spent an embarrassing amount of time debugging issues that turned out to be me forgetting to call one of these in the right place. Like, genuinely hours staring at wrong values going "why is this 16 million instead of negative one" before facepalming.

### Memory: Sparse Dict Instead of Array

Another design decision that worked out surprisingly well - the memory is a code dictionary (`Dict[int, int]`) instead of a byte array. Each key is an address, each value is a byte.

```python
self.mem: Dict[int, int] = {}
```

This means memory is sparse. Uninitialized addresses return 0 (via `dict.get(addr, 0)`), and I don't need to actually allocate 4 GB of memory for the full 64-bit address space. The stack can live at 0xFFFFFFFFFFFFFFFF and the code at 0x00000000 and they don't need to occupy every byte in between.

The downside is that its slower than a bytearray for sequential access, but for a code VM running CTF challenges, performance wasn't exactly my primary concern or even a concern really as long as this thing worked.


## The Assembler (or: "How bad can it a really be to manually write binary?")

Initially, I wasn't going to write an assembler.

I mean, how long could a custom binary really be? I'd just hand-encode the instructions in hex, maybe write a small helper to do the bit packing and then just plop the encoded instructions into a file. It'd be fine. People had hand encoded the whole firmware of the air data controller for F-14 Tomcat in 1969, my game would be hardly comparable to that.

That was what I told myself.

Then I actually planned out what "Unknown Runes" was going to be.

See, the ISA was sort of working at this point - I had the VM executing basic instructions, the fetch-decode-execute loop was functional, and I had just gotten my first set instructions, encoded manually, to execute correctly. The high from seeing an error free debug trace print in the terminal went straight to my head. I was riding the dopamine wave and my brain went "you know what would be really cool? A full text-based RPG."

I envisioned a full text-based RPG with multiple character paths, branching narratives, combat encounters, encrypted strings, hidden flags - the works. A whole game, running on my custom CPU.

Then I tried to make it, and I had to begrundingly scale it down due to reasons... so no complex combat system & fewer paths but it was still *way* too large to be hand-coded in binary. Like we're talking hundreds of instructions plus all the string data. Unless I wanted to lose whatever remained of my sanity, I needed an assembler. (a bit of context, the final binary ended up being 15KB, so yea I really saved myself from a world of pain here). 

I realised a few things, I was not living in 1969, I was not on US DOD's payroll, and that I had access to python. 

So I built one.

![Yay-It-works](https://blogger.googleusercontent.com/img/a/AVvXsEgfzCB7fPa11ITprw65rTCBDckw4QHXqgTA-6UAdOfCHf0yJ9EMqxymAgfn11sOgWKUn8XCrgny4TVQGOqAFQjJ7Dc9IgrS3yNC_w5XC2pv7LQ25k7JTatXS6BPZ6s2BQKwx_lTeTVQotCojDZPcMQw0JU_egQPl6hFdpR0t9r6VRf33AdQreHIZ1vacdPK)

### Two-Pass Assembly

The assembler (`codeISA.py`) is a classic two-pass design which I learnt from digging around online for systems programming projects/blogs. First pass collects all labels and computes their byte positions, second pass actually encodes everything into binary.

Each opcode has a format string that defines what operands it expects:

```python
FMTS = {
    'HALT': '',    'MOV': 'ri',   'MOVR': 'rr',  'ADD': 'rrr',
    'SUB': 'rrr',  'ADDI': 'ri',  'SUBI': 'ri',  'MUL': 'rrr',
    'DIV': 'rrr',  'MOD': 'rrr',  'AND': 'rrr',  'OR': 'rrr',
    'XOR': 'rrr',  'NOT': 'r',    'SHL': 'ri',   'SHR': 'ri',
    ...
    'SYSCALL': '*',
    'PUSH': 'r',   'POP': 'r',    'CALL': 'r',   'RET': '',
    'PUSHI': 'i',  'PUSHA': 'rrr', 'POPA': 'rrr',
}
```

Where `r` means register, `i` means immediate (usually, well it theortically could also work with labels but I didn't really test that), and `*` is the special wildcard for SYSCALL which takes a variable number of register arguments (cause some syscalls need 1 argument, some need 2, some need none).

Directives like `.DS` (define string), `.DB` (define byte), `.DW` (define word), and `.ALIGN` handle data embedding. The `.DS` directive was particularly fun to write cause I needed to support all the escape sequences - `\n`, `\t`, `\0`, hex escapes (`\x41`), the whole lot. I was basically implementing a mini string parser.

The encoding function packs everything into the 42-bit format:

```python
def encode(self, op, r0=0, r1=0, r2=0, imm=0):
    instr = (op << 34) | (r0 << 30) | (r1 << 28) | (r2 << 26) | (imm & 0xFFFFFF)
    return instr.to_bytes(6, 'little')
```

Note how the reserved bits just stay zero cause we never set them. Clean.

### Hello World in Rune ISA

A simple hello world in the ISA looks like this:

```bash
    MOV  RA, 2          ; syscall PRINT_STR
    MOV  RB, msg        ; string address (resolved by assembler)
    MOV  RC, 0          ; length 0 = null-terminated
    SYSCALL RA, RB, RC  ; print "Hello, World!"

    MOV  RA, 0          ; syscall EXIT
    MOV  RB, 0          ; exit code 0
    SYSCALL RA, RB      ; exit

msg:
    .DS "Hello, World!\n"
```

Not too bad right? The assembler resolves `msg` to its byte offset in the binary, stuffs that into the MOV's immediate field, and you're good to go.

Now imagine writing an entire RPG in this. With encrypted strings. And branching paths. And only 3 registers.

Yeah.

I was beginning to see what I had gotten myself into.


## The ImHex Pattern

So I also made this on the side, its a pattern for [ImHex](https://github.com/WerWolv/ImHex) (great hex editor, you can do so much with this). You can find the pattern [here](https://github.com/PrajwalNa/CTF26/blob/main/RuneISA/Source/rune.hexpat) actually.

The colored representation of the rune binary before and below are made by that pattern. I made a basic disassembler in this, it even color codes stuff for you.

![HexPat](https://blogger.googleusercontent.com/img/a/AVvXsEi0mrTautia3GPbRcBTkTBNFHjsa9O1DoEwOhkt41t0A2J7H5NSWxM-vQu-5_WTNia8G9DXvhpyRJW5AnSGZnFxyB0PNms4vaOdc468VO0ngsFZExse-hERA36vQREs6OVQwic_WLxkJyHxolHPAUzMzvly6F9z3hZUvIwEU35-uz_hkMMvTsFFXq6kUEEx)
*If you couldn't tell, I was pretty proud of myself for this XD*

## Unknown Runes: The RPG Challenge

So I built the first challenge - a text-based RPG called "Unknown Runes" where you play through branching paths in the city of Stormhaven. Trust me, this is the scaled down version, I really need to stop my ideas from snowballing like this. I mean this whole thing was beautiful but damn was it a pain to build. (Moments like this is when I thank teenage me for not going into Software Engineering)

Here's the scenario the players would see:

> *The ancient city of Stormhaven has long been a crossroads for those seeking power, glory, gold, or simply a warm meal.*
>
> *A mage hunts forbidden knowledge to empower him. A warrior settles debts with his fists in a tavern that's seen better days. An artificer races to breathe life into cracked metal and silent circuits. A bard does what bards do - everything wrong, beautifully.*
>
> *Four paths. Four fates. Somewhere in the whispers of the city, answers are hidden, but they won't be in any tongue you recognise.*
>
> *Not all paths lead to glory. Choose wisely.*

I was really proud of that last line cause it's literally telling the player that some paths are dead ends. It's a hint, hidden in plain sight in the flavour text. Most people skip the description and go straight to the binary so its for the lucky few (if they even get to that part lmao) :P

The game has four character classes you can pick: Mage, Warrior, Artificer, and Bard (I had fun building their stories tbh, writing random stuff as a hobby helped here).

The catch? Only two paths (Mage and Artificer) have answer prompts where you need to input a passphrase to get a flag. Only one of those (Mage) gives you the real flag (only in the dark path though). The Artificer path gives a decoy flag. And the remaining two paths (Warrior and Bard) are complete dead ends - they exist purely to waste your time if you're trying to brute-force the game by playing it instead of reversing the binary.

I was quite pleased with this design cause it means even if someone dumps memory and finds all the encrypted strings, they still need to figure out which decryption key goes with the actual flag vs the decoy. Multiple layers of misdirection (hours of strategy experience from playing 4X/Grand Strategy games XD).

### The GenJourney Script

Now here's the thing - I realised very quickly that if I'm going to be XOR-encrypting all the game strings (so they're not visible in a hex dump), I am NOT doing that by hand. Especially not the longer narrative strings. Some of these strings are full paragraphs of RPG narrative text.

So I wrote `genJourney.py`, a code script that generates the entire `journey.code` file for me. I put in it all the game strings, the XOR keys, the branching logic, and voila! hundreds of lines of assembly with all the strings already encrypted. It was also here that I realised how much pain I just subjected myself to by intentionally only making 3 registers, I still had to write the assembly, just made some of duplicate parts of the code more uh reusable.


### The Encryption Scheme

The encryption uses three-byte XOR keys, which matches perfectly with the 24-bit LOAD/STORE width of the ISA (this was actually planned, not a coincidence, I'm quite proud of that one). Five different keys in total for different parts of the game:

| Purpose                  | Key (LE)   | Key (BE)   |
| ------------------------ | ---------- | ---------- |
| All printed game strings | `0x1FF1AD` | `0xADF11F` |
| Mage answer              | `0x44A0CF` | `0xCFA044` |
| Mage flag                | `0x46534E` | `0x4E5346` |
| Artificer answer         | `0xAFCEFA` | `0xFACEAF` |
| Artificer flag           | `0xAFAD0B` | `0x0BADAF` |

Now, if you look at those big-endian values and squint a little... 
`0xADF11F` = **ADF-11F** Raven, `0xCFA044` = **CFA-44** Nosferatu, `0x4E5346` = **NSF**. 
These are all fighter jets from Ace Combat. I have a bit of a habit of hiding references from games (usually Ace Combat/Civ) in my challenges XD. In another challenge I made (Crown of Mirrors), the XOR key was literally the ASCII bytes of "FALKEN" (`0x46, 0x41, 0x4C, 0x4B, 0x45, 0x4E`) and the salt was `0x0A, 0xDF, 0x01` - ADF-01 FALKEN, and there was another where the key was 'VIPER' for F-16V. Nobody would usually get them but if you do, it's a fun little easter egg.

The reason for having separate keys is so that even if someone figures out the main string key, they still need to figure out the answer and flag keys separately. Each key is 24 bits (3 bytes) so the XOR decryption works word-by-word using the ISA's native LOAD/XOR/STORE operations.

### The fn_print Decrypt-Print-Reencrypt Pattern

The `fn_print` function in the assembly is where the magic happens. It decrypts strings in-place, prints them via SYSCALL, then **re-encrypts** them before returning. Strings are never sitting in cleartext at rest in memory. This means even if someone is clever enough to dump memory mid-execution, they'd only catch the string that is currently being printed - everything else stays encrypted.

Here's what that looks like in assembly:

```bash
fn_print:
    STOREI RA, 0x10000C     ; save string addr to data memory
    STOREI RB, 0x10000F     ; save padded length
    MZERO RC
    STOREI RC, 0x100012     ; loop index = 0
fn_pr_dec:
    LOADI RA, 0x100012      ; load loop index
    LOADI RB, 0x10000F      ; load length
    JGE RA, RB, fn_pr_print ; done decrypting? jump to print
    LOADI RB, 0x10000C      ; load base addr
    ADD RB, RB, RA          ; current byte addr = base + index
    LOAD RC, RB             ; load encrypted word
    MOV RA, 0x1FF1AD        ; XOR key
    XOR RC, RC, RA          ; decrypt
    STORE RB, RC            ; store decrypted word back
    LOADI RA, 0x100012      ; reload loop index
    ADDI RA, 3              ; increment by 3 (one word = 3 bytes)
    STOREI RA, 0x100012     ; save updated index
    JMP fn_pr_dec           ; loop
fn_pr_print:
    ; ... print the decrypted string ...
    ; ... then re-encrypt it with the same key ...
```

Look at that. Look at how many times I'm loading and storing to data memory. That's because I only have 3 registers and I need ALL THREE just for the XOR loop - one for the loop index, one for the current address, and one for the data/key. The moment I need a fourth value I have to spill something to memory.

Three registers. That's it. Every. Single. Time.

I wrote the entire RPG like this. Every function. Every branch. Every comparison. If someone out there is designing a custom ISA and is considering going with 3 registers - don't. I'm begging you. Four or five would have been the sweet spot between "make people experience hell" and "not wanting to end your own career."

![RPG](https://blogger.googleusercontent.com/img/a/AVvXsEjV6JRk79YQ0SneJnvZsF1BgnDpEr1Yv7EHEEaxpD_lk0isYD52xT75bxswHVvvyukHkYqpOkYIykRXaFQFC-6dz93Iv8-4wWj0Vzt51cpsCJ6VvoZpnSNZHve-_y47ZLAXzfKUiEVUjxhX5X88VNZB3dXgW9ScfP6LDUGlS9aPtBp81I59TEzkkxTDU2fq)
*It was pretty satisfying, even if you were to just play the game.*

### Server and Docker

The whole thing runs inside a Docker container with a TCP server (`server.py`) on port 666 (yes, really). Players connect with netcat, play the RPG over the network, and try to figure out what's going on under the hood. The `.rune` binary file is also provided for download so they can analyze it offline. I also very helpfully included the VM, compiled into a binary using nuitka (can't have them decompiling it back to code), so they can try their luck at reversing this monstrosity (no I'm not joking, nuitka converts the code OOP to C++ first, and anyone who's ever tried to reverse C++ knows how much fun that is).

```python
vm = UnknownRunesVM(inStream=r, outStream=w)
vm.loadProgFile(PROGRAM)
vm.run()
```

The server wraps the socket in a file-like object with UTF-8 encoding so the VM's `inStream` and `outStream` can use `.readline()` and `.write()` just like stdin/stdout. Each client gets their own VM instance on a separate thread. Clean separation.

As an added bonus for the players, the VM binary they get for the challenge has the debug functionality completely stripped and non-verbose error messages. Can't have players getting helpful error messages, that would be too easy :P


## What's Next

So at this point I had a fully working custom ISA, a VM, an assembler, a text-based RPG with encrypted strings and branching paths and decoy flags, all wrapped up in a Docker container. One reversing challenge, done.

But after sinking all that time into building an entire ISA, assembler, and VM, it felt like such a waste to just make one challenge and be done with it. All that infrastructure for a single reversing problem? Nahh.

So I expanded the narrative universe lol. I built a PWN challenge on the same ISA - a buffer overflow exploit with ASCII-safe shellcode, and it involved building stack operations, fighting with the 64-bit/24-bit address mismatch, and hiding a flag in an xattr. That story has its own set of suffering and is covered in Part 2: [A Legacy of Darkness: Building a PWN Challenge on My Own CPU](https://kernelmode0x0.blogspot.com/2026/03/building-pwn-challenge-on-my-own-cpu-i.html).

But before that, let me talk about some bugs I found during the code audit, cause they're genuinely entertaining.


## The Bugs That Humbled Me

So after getting everything working and deployed, I decided to do a proper audit of the codebase. I sat down and went through the spec document line by line, comparing every single thing to what the VM code actually does.

This was... humbling. And by humbling I mean I found five bugs that by all rights should have caused problems but somehow never did during testing. Let me walk through each one cause they're all interesting in their own way.

### Bug 1: NEG Was Off By One

The spec says `Reg1 = -Reg1`, i.e., 2's complement negation. So if RA is 5, you should get -5.

The VM code had:

```python
self.regs[r0] = self.to24(-self.regs[r0] + 1)
```

See that `+ 1`? That's wrong. That's computing one's complement + 1 of the negative, which... doesn't give you 2's complement negation. If RA was 5, the spec says you'd get -5, but the code was producing -4. An off-by-one.

How did this never manifest during testing? Because I never used NEG until in challenge where it cause me tears of bloos. In my brilliance I never wrote a test that NEG'd a value and then checked the result precisely. I always just used `MZERO` followed by `SUB` when I needed negation cause in my smaller scale tests.

The fix was trivial:

```python
self.regs[r0] = self.to24(-self.regs[r0])
```

But I felt pretty silly about it.

### Bug 2: SYSCALL's Validation Was Checking the Wrong Thing

The SYSCALL handler had this check:

```python
if r0 != self.regs[0]:
    raise RuntimeError(...)
```

This was supposed to verify that the first register field is RA. But look at what its actually comparing - `r0` is the register *index* (0, 1, or 2 after decoding), and `self.regs[0]` is the *value stored in register RA*.

These are completely different things! The check would accidentally pass or fail based on whatever value RA happened to hold at the time. If RA contained 0 and the register field was 0 (RA), it would pass for the wrong reason. If RA contained 42, it would always fail even though the register field was correct.

Should have been `if r0 != self.RA` where `self.RA = 0` is the constant for RA's index.

I spent way too long staring at this one going "how did this ever work" in my tests. After a *thorough* audit I realised that in my measely tests, I was not parsing the register correctly and the first reg ended up being one of the three SYSCALLS that I used EXIT (0), PRINT_INT (1), PRINT_STR (2) and the luckily register index in those cases would be the relevant SYSCALL, so the check would pass whenever the syscall number was 0 (EXIT) and some random fails which I wrote off.

I did fix it properly and made a whole new asm test for all opcodes after this, can't be having more issues popping up randomly.

*Do not try this at home, kids. Always do your tests properly and not half ass them like me*


### Bug 3: The Debug Trace Read The Wrong Register

This one is my favourite cause its a code quirk that I should have known about.

When a register field is unused, it decodes to -1 (`NOREG`). The debug trace code was printing register values for all three fields, including unused ones. So it would do `self.regs[-1]` when a field was unused.

Now in code, negative indexing is valid. `list[-1]` gives you the last element. So `self.regs[-1]` silently returns the value of RC (the third register, index 2) instead of throwing an error.

I spent way too long confused by debug output showing phantom register values before I caught this one. The fix was adding a guard: only print the register value if the index is >= 0.

All of these were fixed but it was a solid reminder that writing a spec and writing an implementation are two very different mental activities, and your brain will absolutely convince you they match when they don't. Especially at like 3 AM.


## Closing Thoughts

So building Rune ISA and Unknown Runes was genuinely one of the most educational (and painful) things I've done. I went in thinking "I'll just make a small custom architecture, how hard can it be" and came out with a full ISA spec, a VM, an assembler, a text-based RPG with encrypted strings and branching narratives, Docker deployment, and several hours of questioning my career choices.

The whole thing took way longer than I expected (obviously). There were multiple points where I questioned if I should just scrap the whole idea and make a normal reversing challenge like everyone else. But I'm glad I didn't, cause the look on the faces of the few when they opened the binary in a disassembler and nothing makes sense was absolutely worth it B)

Things I learned (that I'll probably ignore next time):

- **More registers.** Three is fun for CTF players to reverse, absolute agony to program for. Four or five would have been the sweet spot between "a walk through hell for the solver" and "not making the challenge author want to throw their very expensive laptop out the window."
- **Write the assembler first.** I know I said "how long could a binary be" but honestly the assembler should have been step one after the spec. The moment you have even a moderately complex program to write, you need it, I have the utmost respect for people who had to manually encode binary for CPU's back in 70s & 80s.
- **Don't write the spec at 3 AM.** alternatively, *Don't drink & code* . You'll end up writting lazy tests which will make you cry blood when actual program is run.

If you want to look at the code, ~~suffer along with me~~ check it out here: [ISA Spec](https://github.com/PrajwalNa/CTF26/blob/main/RuneISA/Source/ISAspec.md)

In Part 2, I'll talk about how I turned this ISA into a PWN challenge - complete with stack operations I built specifically for exploitation and then couldn't use, ASCII-safe shellcode constraints, and hiding a flag in an xattr. The suffering continues.

Hope you enjoyed this so far, it was atleast a fun descent!


P.S.> The ISA is called "Rune" cause technically in a magical world, languages like code would very much be treaded like Runes, like you invoke them on a rock and the rock is suddenly capable of thinking. (statutory disclaimer: this is a very simplfied version of how CPUs work, please don't hurt me)

P.S.> I put Ace Combat or Civ references in basically all my challenges at this point, its kind of a fun little signature thing. If you ever see suspiciously aviation-themed hex values in a CTF challenge, it was probably me.

If you read this far already, [here's a treat for your ears](https://www.youtube.com/watch?v=AF5kZxxqeZE)
Search This Blog

Prajwal's Blog

I Made a CPU for a CTF Challenge (ft. my blood, sweat, and tears)

Comments

Post a Comment

Popular posts from this blog

Building a PWN Challenge on My Own CPU (I swear I don't like pain)

Lab: Malware Basic Analysis

Windows Shellcoding - 1 : Using WinExec

Windows Shellcoding 3 : TCP Reverse Shell using WinSock