Earlier this year, we began our journey with the T9 Project, as outlined in our post titled “Beginning of Journey to T9 Project“. Now, the first release of T9 Data is almost here. First, we will briefly introduce the background and purpose of the T9 Project, explain why we selected the specific attacks/data for release, and describe how we built the environment.
Why T9?
The T9 Project was initiated due to several reasons: First, there is a lack of high-quality training datasets for cyberattack detection AI models (security AI models). Second, existing training datasets do not reflect the latest cyberattack trends. Third, creating cyberattack training datasets requires significant time and effort. To address these limitations, we regularly replicate the latest cyberattacks (such as Apache Log4Shell, CryptoWire Ransomware, etc.) to create an automated environment for executing attacks and collecting logs (packets, system activity logs, etc.). These logs are then used to develop and improve security AI models.
Table 1. T9 Project 2024 Attack List (2024-01)
T9 Attack ID | Domain | Name / Method | |
1 | T1-24-01-S-N-CL | Network | Apache Log4Shell |
2 | T2-24-01-S-N-CL | Network | SMBGhost |
3 | T3-24-01-S-N-CL | Network | Apache ActiveMQ Deserialization |
4 | T4-24-01-S-E-M | End Point | CryptoWire Ransomware |
5 | T5-24-01-S-E-LM | End Point | XMRing Miner |
6 | T6-24-01-S-E-FH | End Point | Su Brute-Force |
7 | T7-24-01-M-NE-CLM | Network End Point | Apache Log4Shell + XMRing Miner |
8 | T8-24-01-M-NE-CFHL | Network End Point | Apache ActiveMQ + Su Brute-Force |
9 | T9-24-01-M-NE-CLM | Network End Point | SMBGhost + CryptoWire Ransomware |
The attacks listed in Table 1 are recent incidents that have caused significant social disruption or have occurred within the last two years. Some attacks, like Apache Log4Shell and SMBGhost, are older but were selected due to their substantial social impact and consequences.
The attack domains in Table 1 are categorized based on where they can be detected: “Network” refers to attacks detectable through network packet analysis, while “End Point” refers to attacks detectable through system logs collected from the host. Not all attacks can be detected solely by using network or host information. For example, Apache Log4Shell attacks can be detected on the host where the command is executed but are more effectively detected using network packet data. Therefore, we categorized the attack areas accordingly.
Now, let’s focus on a crucial aspect of the T9 Project: the automatic collection of attack logs. The collection process depends on the configured attack environment, utilizing Docker for network attacks and VirtualBox for endpoint or combination attacks (a single attack is a T9 attack, while a combination attack involves two or more single attacks). For log collection, we used packet dump applications such as tcpdump and pktmon to collect packets for network attacks, and Sysmon by Microsoft to collect system logs for endpoint attacks.
Figure 1. Example of T9 Project attack data collection
Figure 1 illustrates the process of collecting attack data. While the details may vary depending on the deployment, the overall process remains similar. When ‘run.py’ is executed to perform an attack, the virtual environment is launched first, followed by the initiation of the log collector. After the attack is executed, the log collection stops, and the logs are sent to the host. Details about the configuration of the environment and log collection for each attack (2024-01 attack list) can be found on our T9 website (https://t9project.dev/). You may also download the minimally collected raw attack data from the website.
Welcome to the T9 website!
The T9 Project website comprises four main sections: Home, Attack, Dataset, and Contact Us. The Home section provides the background and purpose of the T9 Project and an introduction to our overall research.
Figure 2. Front page of the T9 Project website
The Attack section contains a detailed description of each attack, instructions for building and running the environment, MITRE ATT&CK tactic correlations, and the collected attack data (packets, logs, etc.).
Figure 3. Attack page of the T9 Project website
The Dataset section lists attack data available for download (pcap for network logs, evtx for endpoint logs, and log for Windows). The Contact Us section provides contact information for requesting additional information such as deployment environment and attack source.
Figure 4. Dataset page of the T9 Project website
Conclusion
In this blog post, we briefly described the implementation and log collection of the T9 Project and introduced the T9 website where you can obtain T9 Data. We will continue to analyze the latest cyberattacks and update T9 Data periodically. Moreover, in 2025, we will release the benign dataset and the cyberattack detection AI model created using it for practical use in security AI models. T9 Data 2024-02 will be updated on December 17, so stay tuned for our latest news and updates.
KAIST 사이버보안연구센터 악성코드 분석 팀원으로 악성코드 분석 및 연구를 수행하고 있다.
KAIST 사이버보안연구센터 악성코드분석팀 연구원으로 블록체인 및 소프트웨어 테스팅 연구를 진행하고 있다.