Beginning of Journey to T9 Project

Recently, new advanced technologies and services are emerging as ICT technologies are rapidly growing. However, from a perspective of cyber security, these new technologies and services can be another target for attack. A recent released data from the MITRE Corporation, shown as [Table 1], shows the number of CVE(Common Vulnerabilities and Exposures) continuously grows from 18,375 in 2020 to 29,065 in 2023. In addition, the rapid rate of growth surprises us with how many cyber threats we are exposed to.

Table 1. Number of CVE registration per year(Source: www.cve.org)

Year 2020 2021 2022 2023
Q1 4,807 4,415 6,015 7,015
Q2 5,011 5,005 6,365 7,134
Q3 4,170 5,541 6,448 6,936
Q4 4,387 5,200 6,231 7,980
Total 18,375 20,161 25,059 29,065

To effectively counter cyber threats that continuously enlarge and diversify, traditionally NIDS(Network-based Intrusion Detection System) that detects network attacks and HIDS(Host-based Intrusion Detection System) that detects host based attacks are built. These responses can also be divided into rule-based and action-based intrusion detection. Rule-based intrusion detection detects possible attacks from pre-defined detection patterns(signature or rule), has high accuracy of detection, while suffering from the limitations against new types of attacks or patterns. On the other hand, action-based intrusion detection is effective against new types of attacks but has lower accuracy and requires time and resources compared to rule-based intrusion detection. Therefore, recent works focus on mixing two methods efficiently and applying AI(Artificial Intelligence) in analyzing and detecting attacks.

The most important factor in creating a cyber threat detection model using AI is the quality and size of the training dataset. In other words, datasets from actual attacks are crucial to the performance of the cyber threat detection model. Of course, just as important as the attack dataset is the benign dataset. Until now, in order to create cyber threat detection models, especially network attacks, KDD99, NSL-KDD, and CICIDS-2017 are widely used. However, these datasets have several problems. First, these datasets are outdated, meaning these datasets do not reflect recent attack trends. Second, datasets have quality issues due to bias in attack techniques. Lastly, these datasets have limited data in encryption protocols which recently emerged. However, there are no other options to replace these datasets. As a result, researchers and enterprises use open datasets to create cyber threat detection models, but also spend a lot of time and resources to build their own attack datasets internally, either by doing it themselves or by hiring a company (or group) that specializes in doing it for a fairly large budget.

Therefore, our team is conducting a research on a framework that builds the Attacker and Victim environment automatically and performs a variety of attacks in a programmatic way with automated attack data collection. Also, to reflect recent attack trends, our research team re-defines and classifies cyber attack TTPs(Tactics, Techniques, Procedures) from MITRE ATT&CK to create attack scenarios and code to acquire high-quality attack dataset. So now, let’s take a look at our research.

What is T9 Project?

First of all, let’s look at the meaning of T9. T comes from the first letter of Trident, the symbol of Poseidon, the god of sea and 9 means the number of tridents with one trident represents one cyber attack(per tool, code, scenario). Since we’ve briefly touched the meaning of T9, let’s dive deeper into our long term project, T9 project.

Figure 1. Poseidon with Trident

As shown in [Figure 2], T9 project consists of T9 Framework, which automatically generates attack based on threat scenarios and collection environment; T9, the collection of attack tools for 9 attack scenarios; T9 Data, the database of T9; and Social Media, such as website and GitHub where the dataset is shared, and KAIST CSRC Blog.

Figure 2. T9 Project Technical diagram

T9 Framework

For example in a single attack scenario, when the user selects the cyber attack option(one of T9 Data) in the prompt screen, the Attacker and Victim environments are built on Virtual environments(Docker or VM). In each Victim environment, the logging system which collects PCAP, Memory, Network, Process, and Registry is installed and automatically collects the attack data when an attack is executed from the Attacker environment.

Figure 3. Single Attack flow graph of T9 Framework (Example)

① Select wanted cyber attack build in T9 Framework
② Automatically build and create the selected cyber attack in virtualization
③ Execute attack tool that ready on the Attacker environment
④ Generating attack data

T9 Data

T9 Data represents all stacked attack scenarios that are 9 Attack Scenarios (one layer) based on MITRE ATT&CK TTPs named under the convention described below and released twice a year(for each half-year term)

* T9-2301SNA
: 23 (year)
: 01 (between 1~2, if the data is the first data released in 2023, the number will be 1)
: S (S: Single Attack, M: Multi Attack)
: N (N: NDR, E: EDR, NE: NDR/EDR)
: A (Based on 14 MITRE ATT&CK tactics, the first tactic(Reconnaissance) lettered A and the last tactic (Impact) lettered N. Can be repeated)

Social Media

All information regarding T9 Data and attack dataset created by T9 Framework will be released on the T9 website(https://www.t9project.dev, expected to launch on July, 17th, 2024) and GitHub. In addition, if certified users from T9 website enter a code name or vulnerability that exists on T9 Data, users will be able to download the Attacker environment including the attack tool for entered attack and the Victim environment can be searched and downloaded. Users will be able to collect data of attack and result of attack by running the attack tool in the Attacker environment included in the provided environment.

For example, if the example above “T9-23-01-S-N-A” is Path Traversal Attack, the attack tool that performs Path Traversal Attack will be ready in the Attacker environment and web server for Path Traversal Attack and system that can collect attack log will be established in Victim environment. In this way, T9 Framework automatically builds a cyber attack environment and users can easily collect the attack dataset by simply executing the attack tool manually. Moreover, our ultimate goal of T9 Project is to maximize and utilize AI Threat detection model and have utilizations and expectations as described below.

[Utilizations and Expectations]

* Develop Threat Detection Solution(NDR/EDR)
* Evaluate and validate Threat Detection Solution(NDR/EDR)
* Development of threat detection AI model
* Attack Train dataset for research purposes
* Training tool for learning threat response

Conclusion

Now we’ve looked at what is T9 Project, what the research is being done for, what are the circumstances to create the attack scenarios, and how it will scale. From the initial release date(July 17th, 2024), T9 Project will release attack dataset from 9 attack environments twice a year(total of 18 per year) on GitHub and website. In our next blog we will introduce the first release of T9 Project and how to use it, so stay tuned for our latest news and updates.

References

[1] https://www.cve.org/About/Metrics
[2] https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
[3] https://www.unb.ca/cic/datasets/nsl.html
[4] https://www.unb.ca/cic/datasets/dos-dataset.html
[5] 한국과학기술정보연구원, 최신 사이버위협동향 및 대응 방안 분석, 2023
[6] https://attack.mitre.org/
[7] https://commons.wikimedia.org/wiki/File:Wireshark_Icon.png
[8] https://namu.wiki/w/%ED%8F%AC%EC%84%B8%EC%9D%B4%EB%8F%88

1 명이 이 글에 공감합니다.