Reverse your first VM-obfuscated code
PREAMBLE
While selecting challenges for a series on reverse engineering to bring to our hackerspace https://www.meetup.com/it-IT/meethack/, I thought I would bring an easy vm. This one seemed perfect. However, I realized that the majority of the blog posts online about VMs are complex cases, such as VMProtect or custom encrypted multi-stage VMs etc…
So why not do a post on how to approach your first VM by explaining some concepts? So here it is!
VM INTRO
The challenge takes a user input and then runs a VM obfuscated code to determine if the user input is correct or not.
CHALLENGE LINK
https://crackmes.one/crackme/5ab77f6233c5d40ad448c9d9
ANALYSIS
Not being very familiar with the Windows world, I used some quick strategies to identify the VM and the VM’s OPcode.
To identify the VM I started from the _start function and looked at the called functions.
I reached the function at address 0x0040102f
. Looking at it closely, it has the typical structure of a VM dispatcher, which takes the OPcode of the VM and, depending on the value, performs different operations.
So that’s the dispatcher. The caller function sets data_403010 value as a pointer to 0x4016bc
, the memory location containing the VM OPcode.
So we can consider data_403010 as a VM instruction pointer “vip”.
The other value (first argument) passed to sub_401000 is the memory location containing the user input.
One quick way to find out this information is simply to run a debugger (I used x32dbg). Put a breakpointer at the beginning of VM code, run the program, insert any value and press “Verify Password”. The breakpoint will be hitted stopping the program and now you can simply ask the debugger to find the address where that value resides.
At this point, the actual reverse of the VM OPcode can be performed.
We are gonna take the instructions to be “decoded” IN ORDER and write a de-virtualizer.
However, this will only create a valid de-virtualizer for this specific OPcode sequence. If the virtualizer supports other OP codes not found here, our script would no longer work and would need to be extended.
This could occur in a real-world case of a malware that, depending on the circumstances, downloads a different OPcode sequence and executes it. Clearly, the internal VM will support more OPcodes than those used.
If we have enough information (such as registers location, special memory area usage etc…) we could try to de-virtualize the other dispatcher instructions not used by the specific sequence.
BUT we have to de-virtualize the instructions in order. This is because in the OPcode there are also data such as integers or characters which are therefore not code. In fact, in this challenge there are some integer values. Also, some instructions are easier to be de-virtualized if you approach them in order. An example is the ‘jnz’ construct. In fact, the previous instructions will perform mathematical or logical operations and will set bits for control instructions such as ‘jmp’. This then, allows us to guess which memory area is being used to save the VM’s FLAG registers.
Also, it will be easier to understand what type of control it is. In fact knowing that the previous instruction sets a flag to 0, for example, we can easily deduce that the next instruction is a jmp of type jnz or jz etc…
Now we need to understand how the OPcode is processed. Fortunately, no decryption or anything else is required. In fact, the OPcode is processed one byte at a time. Every byte is split in 2. The highest 4 bits (lowest 4 cleared) are used by the dispatcher to ‘distribute’ the operation. In other words, to choose which function to execute. The lowest 4 bits (highest 4 cleared) are used as an argument for the called function.
Let’s take the first OPcode instruction: 0x39. The highest 4 bits are 0011 = 0x3. The lowest 4 bits are 1001 = 0x9. The dispatcher will execute the case 0x30, invoking the function sub_4014bd with 0x09 as argument.
Essentially for 0x39 the virtual instructions will be:
mov vm_reg2, [vm_reg7 + vm_reg8]
add vm_reg7, 4
Another OPcode is 0xd8.
Looking at the disassembly, since the decompilation is actually wrong due to misinterpretation from Binja, we can see that the first thing that the function does is to put 0x00000000
at address 0x00403020
. That address contains the VM flags. It’s possible to deduct it because of the behaviour.
For example, in this function, you can see that the last bit is set to 1 if the comparison of 2 registers results in a “true” result (same value). In the next OPcode it is described how that value will be used for a conditional branch (jz/jnz). So you can imagine that the address contains VM flags like the Zero flag…
But let’s do one step at a time.
After the zero initialization of the VM flags, it compares register ‘dl’ (containing value 0x8 from our lowest byte of OPcode) with some stuff (similar to a “case switch” statement). In this case, a function extracts a 32-bit value from the next 4 bytes of OPcode. The result is saved in eax.
Then vm_reg1 is loaded in edx.
eax and edx are then compared and based on the result, the last bit of the vm flags is set accordingly.
In x86, a cmp is realized as a subtraction in the microprocessor (e.g. cmp eax, ebx –> sub eax, ebx). As a result, various flags are set accordingly as described here: https://c9x.me/x86/html/file_module_x86_id_308.html.
In this way, it is possible on the basis of the result to make jumps both for the equal case and the minor / major case.
This time, however, only the zero flag is used. Consequently, all jump statements after this cmp will be of type jz (je) or jnz (jne).
NOTE: in this challenge, 0xd8 appears only 2 times, always followed by 0x00000000
, so we can directly map the OPcode with the resulting operation. If ‘d8’ had appeared with other arguments, we would have implemented a piece of code that replicates the VM behaviour to obtain the resulting code.
Since the relationship between the cmp and jmp was mentioned, the following statement is shown (0x20):
Essentially, the 4 bytes following the vip are extracted and added to vm_reg6 (which points to the beginning of the OPcode).
After that, it is checked if the last bit of vm_flags (the ZERO FLAG) is set to 1 (the cmp has compared 2 equal values). If it is, a jmp (je therefore) is done at the value calculated before. The main function of the VM will not increment the VIP in this case. Otherwise, it returns and the vip is then incremented by 1.
Following the same methodology, the remaining OPcode could be reversed.
REGISTER INITIALIZATION
Now that we have reversed the code, we need to retrive the VM registers initial values. Registers are initialized before starting the VM. You can retrieve their initial value by statically reverse the code.
But it is not always easy, like in those cases where values are set based on some operations, somewhere in the code. The simple but efficient approach of setting a breakpoint where the VM starts processing the OPcode and examines its value is always valid.
So we get:
vm_reg1 = 0x00000000
vm_reg2 = 0x00000000
vm_reg3 = 0x00000000
vm_reg4 = 0x00000000
vm_reg5 = 0x004016BC
vm_reg6 = 0x004016BC
vm_reg7 = 0x00000000
vm_reg8 = 0x00403049
vm_flags = 0x00000000
Here is the python (3.11) solver:
OPcode = bytes.fromhex('39b0cb00000000ca000000004240ff000000d800000000202600000063303be9b0100c000000c89a02000003d800000000203c000000321042000000ca180000003243')
reg_init = 'vm_reg1 = 0x00000000\nvm_reg2 = 0x00000000\nvm_reg3 = 0x00000000\nvm_reg4 = 0x00000000\nvm_reg5 = 0x004016BC\nvm_reg6 = 0x004016BC\nvm_reg7 = 0x00000000\nvm_reg8 = 0x00403049\nvm_flags = 0x00000000'
def main():
print('Register initial value:\n{0}\n\n'.format(reg_init))
print('CODE:')
i = 0
while i < len(OPcode):
match OPcode[i]:
case 0x39:
print('+{0:x}:\t{1}\n\t{2}'.format(i, "mov vm_reg2, [vm_reg7 + vm_reg8]", "add vm_reg7, 0x4"))
case 0xb0:
print('+{0:x}:\t{1}'.format(i, "mov vm_reg1, vm_reg2"))
case 0xcb:
print('+{0:x}:\t{1}{2}'.format(i, "mov vm_reg4, ", hex(int.from_bytes(OPcode[i+1:i+5]))))
i += 4
case 0xca:
print('+{0:x}:\t{1}{2}'.format(i, "mov vm_reg3, ", hex(int.from_bytes(OPcode[i+1:i+5]))))
i += 4
case 0x42:
print('+{0:x}:\t{1}'.format(i, "mov vm_reg1, [vm_reg1]"))
case 0x40:
print('+{0:x}:\t{1}{2}'.format(i, "and vm_reg1, ", hex(int.from_bytes(OPcode[i+1:i+5]))))
i += 4
case 0xd8:
print('+{0:x}:\t{1}{2}'.format(i, "cmp vm_reg1, ", hex(int.from_bytes(OPcode[i+1:i+5]))))
i += 4
case 0x20:
print('+{0:x}:\t{1}{2}'.format(i, "je vm_reg6 + ", hex(int.from_bytes(OPcode[i+1:i+5]))))
i += 4
case 0x63:
print('+{0:x}:\t{1}'.format(i, "add vm_reg1, vm_reg4"))
case 0x30:
print('+{0:x}:\t{1}\n\t{2}'.format(i, "mov [vm_reg7 + vm_reg8 - 4], vm_reg1", "sub vm_reg7, 4"))
case 0x3b:
print('+{0:x}:\t{1}\n\t{2}'.format(i, "mov vm_reg4, [vm_reg7 + vm_reg8]", "add vm_reg7, 4"))
case 0xe9:
print('+{0:x}:\t{1}'.format(i, "inc vm_reg2"))
case 0x10:
print('+{0:x}:\t{1}{2}'.format(i, "jmp vm_reg6 + ", hex(int.from_bytes(OPcode[i+1:i+5]))))
i += 4
case 0xc8:
print('+{0:x}:\t{1}{2}'.format(i, "mov vm_reg1, ", hex(int.from_bytes(OPcode[i+1:i+5]))))
i += 4
case 0x03:
print('+{0:x}:\t{1}'.format(i, "xor vm_reg1, vm_reg4"))
case 0x32:
print('+{0:x}:\t{1}\n\t{2}'.format(i, "mov [vm_reg7 + vm_reg8 - 4], vm_reg3", "sub vm_reg7, 4"))
case 0x43:
print('+{0:x}:\t{1}'.format(i, "ret"))
case default:
print('+{0:x}:\t{1}'.format(i, b'UNKNOWN'))
i += 1
if __name__ == "__main__":
main()
The output is:
Register initial value:
vm_reg1 = 0x00000000
vm_reg2 = 0x00000000
vm_reg3 = 0x00000000
vm_reg4 = 0x00000000
vm_reg5 = 0x004016BC
vm_reg6 = 0x004016BC
vm_reg7 = 0x00000000
vm_reg8 = 0x00403049
vm_flags = 0x00000000
CODE:
+0: mov vm_reg2, [vm_reg7 + vm_reg8]
add vm_reg7, 0x4
+1: mov vm_reg1, vm_reg2
+2: mov vm_reg4, 0x0
+7: mov vm_reg3, 0x0
+c: mov vm_reg1, [vm_reg1]
+d: and vm_reg1, 0xff000000
+12: cmp vm_reg1, 0x0
+17: je vm_reg6 + 0x26000000
+1c: add vm_reg1, vm_reg4
+1d: mov [vm_reg7 + vm_reg8 - 4], vm_reg1
sub vm_reg7, 4
+1e: mov vm_reg4, [vm_reg7 + vm_reg8]
add vm_reg7, 4
+1f: inc vm_reg2
+20: mov vm_reg1, vm_reg2
+21: jmp vm_reg6 + 0xc000000
+26: mov vm_reg1, 0x9a020000
+2b: xor vm_reg1, vm_reg4
+2c: cmp vm_reg1, 0x0
+31: je vm_reg6 + 0x3c000000
+36: mov [vm_reg7 + vm_reg8 - 4], vm_reg3
sub vm_reg7, 4
+37: jmp vm_reg6 + 0x42000000
+3c: mov vm_reg3, 0x18000000
+41: mov [vm_reg7 + vm_reg8 - 4], vm_reg3
sub vm_reg7, 4
+42: ret
At this point, it is possible to solve the challenge.
But for the sake of completeness, a brief description of the solution is provided.
The code cycles through user input. Specifically, every character entered up to the string terminator ‘\x00’ is taken. The characters are added up. Then an xor is performed with the value 0x9a020000 (ie with 0x029a). If the result is equal to zero (therefore the sum of the characters is equal to 0x029a), the value 0x18 is put in vm_reg3, 0 otherwise. The result is then saved at the address 0x403049. Its value will be used as an offset to be added to calculate the address of the function to be invoked. If it is equal to 0, the function 0x40180b will be called (‘Bad Boy!'), otherwise it will be called 0x0040183a (‘Good Boy!').
Any input whose sum is 0x29a (666) will work, such as ‘JJJJJJJJJ’