Abusing Intel TSX Instructions for Fun and CTF Flag
Using intel TSX/Instructions to create an egg hunter shellcode and solve a CTF challenge.
Intro
My friends and I at Microsoft participated in the HTB Business CTF 2024. The event was fun, and we managed to solve 58 out of 63 challenges, which granted us the 15th place among 943 teams.
I personally dedicated most of my time to the following challenges:
Challenge | Category | # Teams that Solved |
---|---|---|
Insidious | Pwn (Binary Exploitation) | 7 |
SOS or SSO? | Web | 23 |
Living with Elegance | Crypto | 127 |
Chrono Mind | Misc | 143 |
Don’t Panic | Reverse Engineering | 165 |
Additionally, I contributed to the following challenges:
Challenge | Category | # Teams that Solved |
---|---|---|
Blueprint Heist | Web | 63 |
HTB Proxy | Web | 71 |
Swarm | FullPwn | 311 |
Tangled Heist | Forensics | 319 |
Submerged | FullPwn | 657 |
While HTB offers official write-ups, I wanted to share my own experience with the Insidious challenge. With only 7 teams managing to solve it, Insidious was the third least solved challenge. I discovered an unintended solution not covered in the official write-up, and I’m excited to share it with you.
TL;DR
The Insidious (pwn) challenge allowed the execution of shellcode in a highly restricted seccomp1 sandbox. The sandbox blocked all system calls except for the write syscall, which could only access a specific memory address containing the flag. The challenge authors expected participants to solve the challenge using a Prefetch Side-Channel attack2. However, I solved it by implementing an Egg Hunting shellcode that utilized the xbegin3/xend4 instructions, which are part of the TSX extensions5. This was necessary to discover mapped pages without using any syscall.
The shellcode that retrieved the flag was based on the SGXROP paper 6 and proof of concept 7:
|
|
Snippet from Practical Enclave Malware with Intel SGX paper 6:
Egg Hunting. We also evaluated TAP as a novel method for egg hunting in regular (non-enclave) attacks, i.e., scanning the address space for injected shellcode. State-of-the-art egg hunters for Linux rely on syscalls (e.g., access) which report whether one of the parameters is a valid address. However, issuing a syscall requires the syscall instruction as well as setting up the required parameters. Thus, such egg hunters are usually larger than 30 bytes. Nemeth et al. argued that egg hunters with fault prevention cannot be smaller. However, our TAP egg hunter is only 16 bytes in size, i.e., the smallest egg hunter with fault prevention. With 360 cycles per address, it is also significantly faster (by a factor of 4.8) than egg hunters leveraging the access syscall (1730 cycles per address).
The remainder of this post will delve into the challenge details and my journey, including the failures, to ultimately arrive at the payload that secured the flag.
Prolog
As usual, HTB provided a Dockerfile with a fake flag for participants to run the challenge locally for testing purposes, once solved locally, the participants can execute the same attack against a remote server that contains the real flag.
➜ HTB24 sha256sum pwn_insidious.zip
0d5d46e3ea4bbb58a5ae3e5df181f5fcd7570c1d4c03b2d15ea9b19288cd28e3 pwn_insidious.z
ip
➜ HTB24 7z x pwn_insidious.zip
...
➜ HTB24 tree
.
├── pwn_insidious
│ ├── build_docker.sh
│ ├── challenge
│ │ ├── flag.txt
│ │ └── insidious
│ └── Dockerfile
└── pwn_insidious.zip
3 directories, 5 files
The insidious
is an ELF 64-bit binary, not stripped:
➜ challenge file pwn_insidious/challenge/insidious
pwn_insidious/challenge/insidious: ELF 64-bit LSB pie executable, x86-64, version
1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sh
a1]=ab678b13fe5804f5390bc74d9df006494e27c6cd, for GNU/Linux 3.2.0, not stripped
Connecting to the program, a [y/n]
prompt is presented, answering y
and a passcode is requested.
Inputing a 00000000
passcode returns Wrong pass code
:
If we select n
, we receive a shell running with the ctf
user privileges.
The flag is in the disk but is the access is restricted to root
.
Reversing
During the competition, only the binary was available to competitors, necessitating the use of disassembler/decompiler tools to solve the challenge. My go to tool for CTFs is the Binary Ninja
. Since this is a binary exploitation (pwn) challenge, I will also use the source-code8, released after the competition, to explain the challange in a more concise way.
Pass code validation
After the y
switch case statement, the main
function calls the load_flag()
(L116) and drop_privs()
(L117) functions, reads the user passcode using scanf
(L133), and compares the result with the return of the create_passcode()
(L134).
|
|
After discarding some of the values obtained using rand
(L45) in a loop, create_passcode
performs some arithmetic operations on random numbers obtained using rand()
.
|
|
The PRNG is seeded in the setup
function using the current time time(NULL)
with a precision of seconds.
|
|
Two numbers generated in the same second time window will have the same password. By copying the Binary Ninja decompilation of create_passcode
and doing some clean-up, we have a C program with equivalent functionality.
|
|
The Real Challenge Starts
Now that we are able to generate the correct passcode, what exactly should I provide to the Can you share the detailed 80-step guide for a systematic path to the exit?
prompt? Back to reversing!
On line 139, a new memory allocation with Read, Write, and Execute (RWX) permissions is created based on the shellcode_addr
variable. Later, on line 143, 80 bytes (the shellcode size limit) are read and copied to this new location.
Some references to seccomp_load
(L146), seccomp_release
(L147), and memset
(L150-152) are noticed but not relevant at that time.
If all the references to symbols containing shellcode
weren’t clear, we identify that at the end of the main
function, the 80 bytes received on L143 are executed as shellcode.
|
|
Journey to the Flag (Failed Ideas)
Send a Shell Shellcode
Could we just get the flag by executing a reverse shell payload? Not so fast!
As demonstrated in the Pass Code Validation section, the drop_privs
function is executed after the flag is loaded into memory. Anything happening after that call is executed with ctf
user (id 1000) privileges and not root
user (id 0).
Also, we already have shell access by using the n
in the first prompt prolog section.
Reading the Flag from Memory
So we are supposed to read the flag from memory, but where is the flag located in memory?
We quickly realize a new insidious
process is created for each connection using socat
.
The binary has modern protections in place, such as ASLR. It’s time to understand load_flag
and hopefully find a register or a fixed memory address pointing to the flag in memory.
Brute-forcing the Flag Location
Binary exploitation often involves a bit of luck. Let’s take a gamble!
An initial analysis of the load_flag function led us to believe the flag address entropy was only 4096 possibilities (12 bits / 3 nibbles). However, it is actually 16,777,216 possibilities (24 bits / 3 bytes).
The var_38
contains 3 random bytes from /dev/urandom
, shifted left by 2 bytes (address 0x16ec). It is then mmap
ed with RW permissions (proto 3) at address 0x1712. The flag.txt
is open
ed (address 0x1800) and read
(address 0x184f).
We implemented a brute-force attack but quickly realized the mistake. The real CTF challenge server includes proof-of-work validation, slowing down such attacks. We stopped the attack and tried other ideas.
Find the Flag Pointer in Memory or Registers
In binary exploitation, a typical method to bypass Address Space Layout Randomization (ASLR)9 involves identifying a register that points to a specific target memory area, such as the main executable or a library. Alternatively, it may involve finding an offset from a pointer in a register.
In our case, we don’t know what the 3 bytes read from /dev/urandom
were, but if any register still points to the flag location or a memory location with a reliable offset to the flag, we can create shellcode to print the flag.
I tried to find such leaks using the gdb+gef10 debugger, but none were present.
It might not be clear from the decompilation output, but the challenge authors took care to remove references to the flag address from the stack and heap. This becomes evident after the challenge source code is released.
Cleaning flag address from stack (decompilation)
The allocation
and address
is set to NULL
(0x18af, 0x18b7) in the stack.
Cleaning flag address from stack (source-code)
|
|
Cleaning flag address from heap (decompilation)
Cleaning heap memory for address which was used by seccomp call not covered yet. (0x1b43, 0x1b35)
Clearning flag address from heap (source-code)
|
|
General-Purpose Registers
The authors also made sure to clean up the registers and pivot the stack before the shellcode execution (0x1b45..0x1bae).
Note:
The decompiler is very useful and makes the process of reverse engineering more convenient, but sometimes it misses some information. The decompiler output didn’t made any reference to instructions (0x1b45 .. 0x1bae) present in the disassembly view.
Using dogbolt.org decompiler and the same can be observed in angr 9.2.102, Ghidra 11.0.3 and IDA Pro (Hex-Rays) 8.4.0.240320
https://dogbolt.org/?id=cb38c00f-2fb3-4f69-8ef9-8170e5890220
Special or Extension Registers
Quoting Tavis Ormandy (taviso)11 blog post about Zenbleed12:
All x86-64 CPUs have a set of 128-bit vector registers called the XMM registers. You can never have enough bits, so recent CPUs have extended the width of those registers up to 256-bit and even 512-bits.
The 256-bit extended registers are called YMM, and the 512-bit registers are ZMM.
These big registers are useful in lots of situations, not just number crunching! They’re even used by standard C library functions, like strcmp, memcpy, strlen and so on.
I only see the general-purpose registers getting cleaned. Maybe an (unintended?) solution is to get the address from any register, but we are unlucky again… I cannot see any register pointing to an interesting location after running info registers all
in gdb.
Getting the Memory from /proc/PID/maps
In Unix-like operating systems, the /proc
filesystem is a special filesystem that presents information about processes and other system information.
The /proc/PID/maps
lists the virtual address space of a process and could potentially be used to locate the flag address in memory (likely one of the first or second mapped addresses).
Unfortunately for us, reading /proc/PID/maps
is restricted to the root user (uid 0), even though the process has dropped its privilege to the ctf user (uid 1000).
Egg Hunting
With the introduction of ASLR, exploits could no longer reliably use fixed addresses in shellcode. In many exploitation scenarios, the attacker faces technical constraints such as limited payload size or restricted characters (e.g., null char, new lines, etc.).
In some instances, it is possible to implement a multi-stage shellcode, where a small first-stage shellcode stub scans the memory for a second-stage shellcode. The first 4 or 8 bytes of the second-stage payload are commonly referred to as the needle, and the entire memory is the haystack. This technique is called Egg Hunting.
A shellcode cannot directly scan the full virtual memory address space (linear memory) without risking a Segmentation fault (segfault)13.
Matt ‘Skape’ Miller14 described multiple Linux and Windows Egg Hunter techniques in his 2004 paper, Safely Searching Process Virtual Address Space15.
This challenge can be approached as an egg hunting problem. The flag starts with a known HTB{
prefix (needle), and once this location is found, we can use the write syscall to retrieve it.
Access syscall based egg hunting (failed)
One of the most common egg hunting implementations is the access syscall based egg hunting. 16
The access17 syscall has the signature int access(const char *pathname, int mode);
and checks whether the calling process can access the file pathname. In summary, the basic idea is that the *pathname can have be a pointer to any memory address, the syscall is executed in kernel space, and if the memory is not mapped in the virtual address of the process, a EFAULT
error is returned.
For our scenario, the following shellcode was executed.
|
|
Program terminated with signal SIGSYS, Bad system call.
Why is this happening?!
Seccomp
Seccomp1 (short for Secure Computing) is a Linux security feature that permits the creation of syscall filtering policies. A user-space process’s syscall invocation is checked by the seccomp policies before being served by the kernel. Only syscalls allowed by the filtering policy are permitted. For more implementation details, check the terenceli blog18.
Some of the previous sections of this blog post had references to Seccomp; it is now time to cover it.
insidious
has references to four seccomp functions: seccomp_init
, seccomp_load
, seccomp_release
, and seccomp_rule_add
.
seccomp_init
seccomp_init
is used to initialize the seccomp filter state, and it returns a scmp_filter_ctx
.
This function is invoked in the load_flag
function, and the ctx
pointer is saved to a global variable.
The seccomp_init
function takes one argument, def_action
. The value 0
in our case is the constant SCMP_ACT_KILL
, responsible for emitting the SIGSYS
signal that killed our Access syscall-based egg hunting payload.
seccomp_rule_add
seccomp_rule_add
is used to define the filter rules.
This function is also invoked in the load_flag
function, but the Binary Ninja decompiler didn’t resolve the constant values.
Luckily (or most likely intentionally), the challenge authors left an assert statement that contained more information about the source code.
Apparently, we can only use the WRITE
syscall.
seccomp_load and seccomp_release
seccomp_load
is what, in fact, applies the seccomp policy in the current process. This is invoked after our shellcode is received by the main
function (otherwise the read
syscall receiving our shellcode would fail).
After being loaded in the kernel, the global ctx
is no longer necessary and can be erased from memory (heap) with the seccomp_release
function.
Write syscall based egg hunting (failed)
Checking the write
syscall manpage (man 2 write
)19, we can see write
syscall, similar to the access
syscall, also return an EFAULT
erro in case the buf
is outside the process address space. We can use this behavior to implement a write-based egg hunter shellcode.
|
|
SIGSYS with Bad system call again? But wasn’t write
syscall allowed? What happened?
Really understanding the Seccomp filter
The Binary Ninja decompilation didn’t provide all the information, but the assert message error had the full line of code:
|
|
seccomp_rule_add
is a Variadic Function20 and can perform some basic comparisons of the syscall arguments.
Specifically for this challenge, we have:
|
|
This means that the second syscall argument (rsi
register in x86_64 calling convent), which is the const char *buf, must contain the flag address (allocation address).
We cannot use any syscall other than write with the right flag address.
Recap
At this point, we know that:
- The challenge will execute our shellcode, which is limited to 80 bytes.
- The flag is loaded in memory at a random location.
- The flag starts with
HTB{
- No register or memory during the shellcode execution points to the flag.
- The challenge implements a seccomp sandbox blocking all syscalls except
write
with the flag address. - The challenge requires reading the flag from the current process memory, as our shellcode is running in a seccomp sandbox with dropped privileges.
- We need to discover the location without causing a SEGFAULT.
Research Time
With the syscall limitation, we cannot implement classic egg hunting based on syscalls. At this time, I started wondering if this challenge would require a time-based side-channel attack.
Another possibility would be to find an egg hunting method that would not crash the process.
Prefetch Time-Based Attack (Intended Solution)
Modern computers use multiple layers of local fast caching memories, avoiding accessing the slow RAM memory (when compared to register and cache), resulting in improved execution performance 21.
Cache prefetching is a technique used by computer processors to boost execution performance by fetching instructions or data from their original storage in slower memory to faster local memory before it is actually needed 22. It can be hardware or software-based. For software implementations, the x86_64 CPU has the prefetch
family of instructions.
In 2016, a paper was published demonstrating how a Prefetch Side-Channel Attack could be used to bypass Kernel ASLR 2 23 24.
The paper suggests measuring the time that a prefetch
instruction takes to execute as an oracle, leaking information about whether the memory address is valid or not.
An implementation of this attack is present in a lab of the MIT Secure Hardware Design course 25.
During the competition, I suggested this attack, but it would require some additional understanding of the attack and benchmarking memory access times. I didn’t discard this option but kept looking for alternatives.
A full write-up of this approach can be found in the official26 challenge write-up.
TSX-Based Egg Hunting (Unintended Solution)
During the research, I also found the paper Practical Enclave Malware with Intel SGX paper6, that suggests, among other things, a better (or should I say faster and shorter) egg-hunting using Intel TSX intructions.
We also evaluated TAP as a novel method for egg hunting in regular (non-enclave) attacks, i.e., scanning the address space for injected shellcode. State-of-the-art egg hunters for Linux rely on syscalls (e.g., access) which report whether one of the parameters is a valid address. However, issuing a syscall requires the syscall instruction as well as setting up the required parameters. Thus, such egg hunters are usually larger than 30 bytes. Nemeth et al. argued that egg hunters with fault prevention cannot be smaller. However, our TAP egg hunter is only 16 bytes in size,1 i.e., the smallest egg hunter with fault prevention. With 360 cycles per address, it is also significantly faster (by a factor of 4.8) than egg hunters leveraging theaccess syscall (1730 cycles per address).
What is Intel TSX
Intel TSX (Transactional Synchronization Extensions) is an extension to the x86 instruction set architecture that provides hardware support for transactional memory. It allows developers to write code that can be executed atomically as a single transaction, ensuring consistency and isolation. Think of it as relational database transactions, but for CPU operations.
A clear proof-of-concept of using Intel TSX as an egg-hunting is available 7:
|
|
We now have everything needed to solve the challenge.
Or, well… almost.
One problem with the TSX-based shellcode is that it requires CPU support, which is limited to specific 6th-10th gen Intel consumer CPUs and v5-v6 Intel Xeon Server families.
Furthermore, due to Transactional Asynchronous Abort (TAA) - CVE-2019-1113527 28 29 vulnerability, which is part of the Microarchitectural Data Sampling (MDS)30, one mitigation 31 32 33 is to disable disable TSX feature completely.
Getting the flag
Now, I had two paths forward:
- A simpler one, leveraging TSX, requiring TSX (RTM) instruction support.
- A more elaborate one, using a pre-fetch attack, requiring benchmarking CPU timing cases.
CPU Support
Using the low-privileged shell available with the option n
, I checked the /proc/cpuinfo
and, to my surprise, the challenge server had TSX (RTM) support enabled:
Execution
With that, I only needed to blindly send the TSX egg-hunting based shellcode.
Blindly, because I didn’t have access to any machine with TSX instructions available at home.
|
|
Flag
Fortunately, the shellcode executed correctly without the need for debugging, and I received the flag:
Conclusion
This post turned out longer than anticipated, but it provides a detailed walkthrough of the challenge. It demonstrates the use of Linux egg-hunting shellcodes, leverages TSX for exploits, introduces the concept of a seccomp sandbox, and illustrates how theoretical concepts like egg hunting and side-channel attacks can be applied in real-world scenarios to solve security challenges.
Because of this challenge, I also started reviewing past research on microarchitectural attacks, side-channel techniques, trusted computing, and confidential computing. Hopefully, I can expand on this subject in future blog posts.
If you are interested in microarchitecture exploitation, check out the excellent Microarchitecture Exploitation playlist34 from https://pwn.college/35 at Arizona State University and the references below.
-
https://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions ↩︎
-
https://github.com/IAIK/sgxrop/blob/master/egghunter/hunt.c ↩︎ ↩︎
-
https://github.com/hackthebox/business-ctf-2024/blob/main/pwn/%5BMedium%5D%20Insidious/src/insidious.c ↩︎
-
https://en.wikipedia.org/wiki/Address_space_layout_randomization ↩︎
-
https://www.hick.org/code/skape/papers/egghunt-shellcode.pdf ↩︎
-
https://www.hick.org/code/skape/papers/egghunt-shellcode.pdf ↩︎
-
https://terenceli.github.io/%E6%8A%80%E6%9C%AF/2019/02/04/seccomp ↩︎
-
https://www.man7.org/linux/man-pages/man2/write.2.html#:~:text=has%20been%20exhausted.-,EFAULT,-buf%20is%20outside ↩︎
-
http://csg.csail.mit.edu/6.S983/labs/aslr/#part-1b-the-prefetch-instruction-40 ↩︎
-
https://github.com/hackthebox/business-ctf-2024/tree/main/pwn/%5BMedium%5D%20Insidious/README.md ↩︎
-
https://www.intel.com/content/www/us/en/architecture-and-technology/mds.html ↩︎
-
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html ↩︎
-
https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html ↩︎
-
https://github.com/IAIK/sgxrop/tree/master?tab=readme-ov-file#note-on-broken-microcode ↩︎
-
https://www.youtube.com/watch?v=mMC_vYSHbjI&list=PL-ymxv0nOtqoU92gd9MEX4ABDGW6nvVma ↩︎