Performance and Functionality Evaluation of White-box Software Testing Tools, Part 2


Evaluating automated white-box software testing tools

In our latest post, ‘Performance and Functionality Evaluation of White-box Software Testing Tools, Part 1’, we emphasized the importance of software safety through a number of software incidents, and then explained the definition of white box software testing for software verification and available tools. Software testing methods in order to discover errors or bugs that may occur in the software development process are categorized into black-box testing, white-box testing, and gray-box testing that combines the advantages. In this post, we conducted source-code level white-box testing, a software testing method based on the internal structure and the operation principle of programs. White-box testing tools are being utilized in wide variety of industry areas, such as testing software for automobiles, aircraft, and ships. In particular, with increasing use of electronic equipments in vehicles recently, vehicle manufacturing companies are focusing on securing software and hardware quality. With this trend, the International Organization for Standardization (ISO) has included standards for testing software components in ISO 26262, an international standard for functional safety of electronic devices in vehicles, established back in 2012.

Standardized testing for software components is becoming increasingly important in all industries. Given this trend, what would be the effective way to increase software quality, stability, and reliability? One possible answer to the question would be to conduct white-box software testing to verify that software is free of defects, or identify and fix any existing defects. In order to do so, we test the software on a suite of definite test cases. A test case is a set of executable code snippets or instructions created to check that the software performs as expected. In other words, it is a criterion for testing whether output values are derived normally for each input value extracted from the requirements specification.

Traditionally in software testing, the user needs to manually generate test cases based on test requirements. However, as the size and complexity of software system increases, such manual approach quickly becomes unscalable. Due to the need for a more efficient analysis and design of modern software system, active ongoing research efforts have been focusing on automating test case generation. Such efforts resulted in the development of a number of automated white-box software testing tools.

So far, we have introduced the background and need for automated white-box software testing and corresponding tools. However, a software developer who wishes to purchase and utilize such tool would have practical considerations in mind. For instance, would an automated testing tool that generates as many test cases as possible be more suitable? Or instead, would there be situations that tools focused on generating minimal test cases more preferable? There should be no definitive answer here. Since each tool has different standards and methodologies in generating test cases, users (customers) are inclined to choose testing tools only based on informal measures, such as familiarity or market share of the company. Therefore, in this post, we systematically conduct qualitative and quantitative evaluation on the performance and functionality of automated white-box software tools, in an attempt to provide measurable criterion for the selection.

Lists of automatic white-box software testing tools

As mentioned, in the traditional white-box testing method, users (humans) manually write test cases via specifying appropriate preconditions, execution methods, and input and output values, and test if the software functions accordingly. To ease this process, automated white-box testing tool automates the process of building and exploring the source code using the built-in compiler for the project (source code) without user intervention.

The following is a summary and key features of a number of automation tools that are capable of generating test cases for projects (source code) written in C/C++. All listed information is based on the product catalogue of the corresponding tool.

Product NameKey Features
Controller Tester– Developed by Korean company SureSoft Technologies
– The automated testing solution to perform unit and integration tests
– Automatically generate test and test data, and it is possible to support simulation and real target environment tests required for testing
– Allows automatic generation of stubs for functions (such as libraries) without definitions
– Provides control flow graphs and checks coverage results in conjunction with source code
– Possible to automatically insert fault injection code without modifying the source code
Coyote– Developed by Code Mind, a Korean company
– Fully automatic white box testing tool that combines symbolic testing and machine learning technology
– Symbolic testing technology ensures efficient testing without re-accessing covered branches or paths as much as possible
– A Fully automated unit testing tool enables complete automation of unit testing from one-click creation to execution to results analysis without user intervention
– Achieved over 80~90% code coverage with full automation
Resort– Developed by Soft4Soft, a Korean company
– Static analysis enables source code analysis without a code compilation process
– Top-of-the-line code verification of inter-procedure path analysis, coding standards, runtime errors, security vulnerabilities, and code quality checks possible
– Automate path testing based on code requirements with dynamic analysis
– Enables automatic extraction of run paths as test cases and automatically generates test data (test domains, input values) for each run path
Cantata++– Developed by German company QA Systems, Hancom Nflux Co., Ltd. is in charge of domestic reseller
– Automated unit and integration testing tools for C/C++ languages that combine testing for both unit and scalable integrated testing to provide an integrated testing framework
– Support for functional testing and integrated code inspection results analysis
– Reports can be generated in RTF (Text), XML, and HTML formats, along with ASCII and HTML text
– Developed based on Eclipse IDE to provide familiar UI to Eclipse users
– Visual presentation of test results for code coverage to the user
– Automatically create a unit test collection that runs all required code paths
VectorCAST/C++– Developed by Vector, a German company, and has a branch in Korea
– Specialized in verifying embedded systems that prioritize software safety
– Automatically configures the test environment for unit and integration tests, providing a very convenient test environment for users without the need to write test codes
– Test support via GUI and script, code coverage, regression testing, code complexity calculation
– Built-in compiler, so there is no need to build a separate compilation environment
– Supports user-defined tests for requirements-based testing
ParaSoft C/C++– Developed by ParaSoft, an American company, and handled by several companies, including VWAY in Korea
– Automatically generate verification reports by automating software testing
– Integrates into C/C++ IDEs, CI/CD pipelines, and containerized deployments to detect defects early and automatically enforce compliance with industry standards
– Build an automated and scalable CI pipeline using Docker Hub images for seamless access to the latest automated code analysis
– VS Code plug-in for GitLab enables results review within IDE
LDRA– Developed by British company LDRA and in Korea by Moasoft
– Leading the software analysis and automated testing tools market for over 45 years – One of the leading software testing companies
– Most commonly used in software testing in aerospace and defense, automotive, industrial and energy, healthcare, and rail transport, not only in Korea but also abroad
– Provides test case management and automated execution for unit and integration testing
– Provides requirement traceability and can automatically generate test cases that provide 50-80% of code coverage

Figure 1 . International Standard ISO 25010 Evaluation Criteria (Source: splex)

What features and elements are essential for automated white-box testing tools?

On what criteria should the automated white-box testing tools be evaluated? In order to set up qualitative and quantitative grounds for testing, we referred to two standards: (i) “Commercial software evaluation criterion and score assignment” of “Guidelines on Technical Assessment on Software” (Article 49, Paragraph 2 of Software Promotion Act), and (ii) ISO 25010 (international standard for software evaluation). Based on the standards, we set up 34 separate evaluation criteria, where similar criteria are grouped to form three-level hierarchy classes: evaluation property, evaluation item, and individual criterion. In particular, according to the purpose of referencing the ISO international standard, the evaluation criteria’s extensive comprehensiveness, ambiguity, and unnecessariness were reduced, revised, and reflected to match the purpose of this study.

For instance, in Figure 1, which presents the evaluation criteria of ISO 25010, the international standard “Functional Suitability” was revised to “Functionality” as shown in Figure 2 (our evaluation criterion), and likewise, “Functional Maturity”, “Functional Accuracy”, and “Functional Validity” were reduced to “Accuracy” in our version. As there is no practical way to evaluate “Confidentiality” and “Non-Reputation” under “Security” in our study, we added them to “Functionality” property in order to evaluate minimal security. Properties “Usability” and “Reliability” are kept as-is, but “Maturity” and “Fault Tolerance” under “Reliability” are excluded from the evaluation criteria as they are less significant. Additionally, “Performance Efficiency”, “Portability”, and “Compatibility” are embedded into “Usability” and “Reliability”, and “Stability” was modified to “Operational Stability”, which is included in “Reliability”. The overall evaluation items and criteria are shown in Figure 2.

Selection of open source projects for evaluation

White box testing tools are widely used in fields where highly reliable and safe software is required, such as aerospace, military, automobiles, or safety-critical industries. Most of the software in this field is composed of C/C++ — due to this, venders of automated white-box testing tools mostly target C/C++ projects. In our study, we also selected C/C++ open-source software commonly used in various performance and functional evaluation, including libraries for system UI, encryption, signal processing, communication protocols, and more. Specifically, we referred to paper “CITRUS: Automated Unit Testing Tool” for real-world C++ programs. Additionally, we only filtered open-source software projects that can be natively built without aid of any external software.

Figure 2 . Evaluation Criteria

 Project NameRelease DateLanguageNumber of File**LOC**Size**
1nuklear2019.12.02C19213182810 MB
2libsodium2019.5.31C6905113244.8 MB
3mathc2019.5.31C45886223 KB
4aubio2022.1.26C347173811.6 MB
5s2n-tls2022.10.25C78179279317.9 MB
6qnite2022.04.14C++1382372168 KB
7QPULib2020.12.09C++825611978 KB
8yaml-cpp2021.07.10C++399549224.9 MB
9jsoncpp2021.08.12C++2508271828 KB
10json-voorhees2021.07.12C++22784213 MB
*Total number of files in the project (including other files such as Readme.md, excluding some files such as resources)
** LOC : Lines of Code

Table 1. Projects for Evaluation

Selection of automated white-box testing tools and test results on open-source projects

In our study, we selected four among seven automated white-box software testing tools according to their product market share, information, and technical support, and named them A, B, C, and D. Selected tools were used to analyze open-source projects listed in Table 1. Tests were conducted after establishing a test environment directly through the manual provided with the product, but all of the selected white box automated testing tools were completely unable to proceed without user intervention. In particular, for products A, B, and D, only the ‘mathc’ project, number 3 in Table 1, was tested normally, but for product C, it was impossible to test not only the ‘mathc’ project but all 10 projects selected by us. Please refer to “Evaluation of Whitebox Automated Testing Tools” for specific reason why testing was impossible.

Except ‘mathc’ project, 9 open-source projects caused unknown errors including failure in compilation, and thus ended up failing to complete software testing. Considering that the failure might have resulted from inexperience of handling the tool, we have requested technical support from each company in order to resolve the problem. In the case of tool ‘B’, a simple configuration file was provided in order to aid white box testing, and it was sufficient to easily complete automated testing all ten projects within a few days. On the other hand, for the other three (A, C, D) tools, it took close to two months in fully resolving the issue, even with the technical helps in configuring test environment, compilers, handling series of processes, and more. We had to invest much time and effort in resolving this issue, and the procedure for getting technical help was too complicated that a separate phonecalls or remove meetings were required for each project. Multiple human (expert) interventions were necessary, to the extent that we were doubtful whether this process could be called fully automated. We note that due to these difficulties, we have reconsidered the meaning of ‘automated software testing’ different from when we started this study. The results are shown in Table 2. Of the four selected tools, tool ‘B’ has satisfied more than 90% of our criterion for evaluating ‘automated testing’, which shows the excellence of the tool’s automation function.

Open-Source ProjectCoverageNumber of Test casesBuild timeTest time
Project NameReleaseLanguageFileLOCAnalyzed fileNumber of functionsLineBranches
1nuklear2019.12.02C1921318286760987.21%79.88%529400:03:1502:04:02
2libsodium2019.5.31C6905113219588794.94%84.93%322300:04:4800:09:15
3mathc2019.5.31C45886184399.43%100%147900:00:2400:10:21
4aubio2022.1.26C3471738113952093.98%89.73%490600:02:4502:29:15
5s2n-tls2022.10.25C781792793688162186.44%80.53%1946400:19:3909:12:36
6qnite2022.04.14C++13823729564595.64%89.0%347100:16:4202:38:28
7QPULib2020.12.09C++8256114027886.66%81.97%380100:01:2500:34:57
8yaml-cpp2021.07.10C++3995492215536795.52%93.85%398500:05:1102:15:39
9jsoncpp2021.08.12C++25082711430991.21%87.2%564500:00:4902:50:49
10json-voorhees2021.07.12C++22784216145190.39%84.56%324600:03:4000:40:14
Table2, White box testing results for ‘B’ automated software testing tool

Except tool ‘C’, all tools were able to successfully complete tests on ‘mathc’ project. Hence, we have decide to evaluate performance and functionality of automated tools using ‘mathc’. Although we acknowledge that evaluating on ‘mathc’ alone is insufficient, we believe the results can provide initial results in comparing different automated testing tools.

Evaluating automated white-box testing tools

For each qualitative criterion, we assign one of three scores depending on whether or not the functionality is supported: ‘O: Supported, provision given / △: Supported, but insufficient or lack of information / X: Not supported, no provision given’. For each quantitative criterion, if the quantity is not available or disclosed by the software, we simply state ‘Not Supported’ instead of ‘X’ (which belongs to qualitative score) to prevent misunderstandings.

Property 1. Functionality

Functionality is a property on the most essential (functional) aspect of the automated white box testing tool, where we check information on analyzed (detected) codes, number of generated test cases, code coverage information, and more. As described above, evaluation items should be evaluated based on 10 open source projects, but due to the reason detailed above, some of the most essential items among white box automation testing evaluation items were evaluated and compared only based on the ‘mathc’ project. However, in the case of product ‘C’, testing on ‘mathc’ resulted in failure. Due to this, we note that some tests are excluded as manually writing test codes (test cases) takes much time and defeats our purpose of assessing automated white-box testing.

  1. Measurement of build time: The build time of the ‘mathc’ project, the source code file size is 159kb, and the total number of code lines is 5586 Lines. Tool ‘B’ resulted in the fastest build time of 24 seconds, and tool ‘A’ took 2 minutes and 27 seconds to build, resulting in a difference of about 6 times or more.
  2. Test completion time: We measured the time taken for the white box testing tool to completely finish the test. The tool that finished the test the fastest was 2 minutes and 50 seconds with tool ‘D’, and tool ‘A’ took the longest at 25 minutes. However, the fast and long test times depend on the generated test cases in the 13th indicator, and a large number of test cases is not a good thing. See indicator 13 for details.
  3. Pre-build file: It means checking in advance whether the test proceeds normally through the pre-build before the test. The actual source code (file) to be built was detected (analyzed) as ‘1 file’ for the ‘B’ tool, but by default, it is counted as ‘2 files’ because it detects (analyzed) including header files. The other two tools could not be verified because they could not be tested or did not show relevant information.
  4. The number of detected files: The number of source code (files) analyzed (detected) after completion of the test. Only the ‘B’ tool was detected (analyzed) as ‘1 file’, but as in the previous evaluation criteria, it was counted except for the header file, and the remaining three tools were evaluated as ‘unsupported’ because they were unable to test or did not show related information.
  5. The number of detected functions: Number of functions analyzed (detected) after completion of the test. For ‘B’ and ‘D’ products, the detected function matched 100% of the actual number of functions, but for ‘A’ and ‘C’ tools, it was not supported or testable.
  6. The actual number of build files: This is a build phase for the actual testing, and errors that occur in the optimization and pre-build phases may result in different results from the number of pre-build files. For the actual source code (file) to be built, the ‘B’ tool was detected (analyzed) as ‘1 file’, but the ‘B’ tool showed the number excluding header files, and the ‘D’ tool was detected (analyzed) including header files. The result was evaluated as ‘2 files’. The other two tools could not be tested or could not be verified because they did not show relevant information.
  7. The number of detected code lines: the actual number of code lines executed by the white box automation testing tool, which detects the number of optimized testable lines by removing executable and lines with specific character ‘{ }’ alone. The ‘A’ tool detected 2887 lines, but the ‘B’ tool analyzed 4500 lines, approximately 1700 more than ‘A’, showing the best results. The ‘C’ and ‘D’ tools were not supported or could not be tested.
  8. The number of detected branches: The number of branch (conditional statements) items in the code by the white box automation testing tool. The evaluation compared the branch number (138 branches) with only If or else statements. Based on the tool and the detection criteria, we counted 1045 branches for For, Call, Return, and Switch, and the ‘A’ tool detected 190 branches closest to the criteria (138 branches), and the ‘D’ tool detected 978 branches for the ‘D’ tool. The correct answer is not determined because the assessment item’s value depends on the tool’s character.
  9. Line coverage value: ‘%’ value of how many lines were tested during the test based on evaluation indicator 7. For example, the ‘A’ tool detected 2877 lines in indicator 7, and the coverage value showed 98% results. The meaning is that the ‘A’ tool has tested 2819 lines from 2877 lines at least once. In the case of the ‘A’ tool and the ‘D’ tool, the ‘%’ values were the same, but it was not possible to determine how many lines the ‘D’ tool was found, showing ambiguous results, and in the case of ‘B’ tool, 4192 lines were 100% covered and tested, showing outstanding results.
  10. Branch Coverage Value: ‘%’ value of how many branches (conditional statements) were tested during the test based on evaluation indicator 8. For example, the ‘B’ tool detected 190 branches in indicator 8, and the coverage value showed 100% results. The ‘B’ and ‘D’ have the same coverage value, and the ‘A’ tool has 95% branch coverage, which is the lowest rating. (Except for ‘C’ that cannot be evaluated)
  11. Support for Line Coverage by Code (File): Assessment indicator of whether or not the tool has the ability to show line coverage by code (file) rather than full line coverage as a result (report). Except for the ‘C’ tool, all other tools were supported by code (file).
  12. Support for per-code (file) branch coverage: Assessment indicator of whether or not the result of the tool (report) has the ability to show branch coverage by code (file) rather than by total branch coverage. As in Evaluation indicator 11, all other tools were supported by code (file) except for the ‘C’ tool.
  13. The total number of test cases created: The result of the number of tests created and tested by the white box automation testing tool. The ‘A’ tool produced the most 2461 test cases, and the ‘C’ tool produced the least 978 test cases. The ‘B’ tool produced the second-largest number of 1479 test cases. Different tools have different algorithms and methods, so the number of test cases cannot be evaluated as good or bad.

– We note that the average code coverages for testing ‘mathc’ project was higher than other cases, as seen in Table 2. Specifically, tool ‘A’ and ‘D’ had high code coverage, where we suspect that because ‘mathc’ project consists of functions that return simple arithmetic operations without dependencies between source codes, many lines of codes are easily analyzed.
– For this reason, in the case of a project with many dependencies between source codes, it is considered impossible to test because it is difficult for the white box automation testing tool to configure the built environment.
– The most important thing in a white box automated testing tool is to achieve high coverage with fewer test cases. In other words, efficient testing is performed when the test cases are small and the coverage is high.
– A large number of test cases increases the likelihood of having many duplicate test cases (repeated numbers) that do not contribute to coverage.


Evaluation ItemsEvaluation CriterionABCD
AccuracyBuild time for project (Code File size: 159 kb, Code line: 5586)02:2724sUnable to test52s
Time for white box testing tools to complete tests during project analysis25:0410:21Unable to test02:50
How many pre-built files appeared on the report or completion screen when analyzing the project (1 Header, 1 Source Code)?
※ Basically, it is good to have the same number as the actual source code files, and depending on the analysis tool, header files may be excluded from the number of builds.
Unsupported1Unable to test2
How many detected files (source code) appeared on the report or completion screen when analyzing a project (1 Header, 1 Source Code)?Unsupported1Unable to testUnsupported
How many functions were detected in this project with 843 functions?Unsupported843Unable to test843
How many actually built files appeared on the report or completion screen when analyzing the project (1 Header, 1 Source Code)?
※ Basically, it is good to have the same number of actual source code files, and depending on the analysis tool, header files may be excluded from the number of builds.
Unsupported1Unable to test2
How many lines of code were detected in the report or completion screen when analyzing a project with 5586 Code Lines excluding spaces and comments?28774192Unable to testUnsupported
Conditional statements based on source code (excluding For, Call, Return, switch) Number of analyzed branches in a project with 138 branches?
※ Depending on the analysis tool, the branch value may differ from the actual number of branches considering all conditions such as For, Call, Return, switch, etc.
1045190Unable to test978
What is the value of the tested line coverage?
※ ‘%’ value of how many lines were tested during the test based on the results of the evaluation indicator 7 Example) ‘A’ tool detects 2877 lines, and tests 98% of 2819 lines
98%99.43%Unable to test98%
Branch coverage value that has been tested?
※ Based on the results of the evaluation indicator 8, it is a ‘%’ value of how many branches have been tested during the test. Example) Tool B detects 190 branches and tests 190 branches that are 100%
95%100%Unable to test100%
Possible to support or verify line coverage per code (file)?OOXO
Possible to support or verify branch coverage per code (file)?OOXO
Total number of test cases generated in reports and results screens after test completion?24611479Unable to test978
SecurityIs there a function that is accessible only to authorized users in the test program or system?XOXO
Is it possible to analyze the source code without external leakage?OOO
Property 2. Usability

Usability is an overall assessment of user-centric learning and usage. In case of criterion no. 17, ‘B’ and ‘C’ products were well-written according to the order of use, making it easy to understand the flow. However, in the case of tools ‘A’ and ‘D,’ the provided manual had poor quality, so that there were difficulties in users to set up test environment themselves. Criterion no. 18 mainly focuses on the primary function, and it was difficult to set it up because it did not provide a detailed description of the function. In case of product ‘B’, the tool provided explanations for all the functionalities in detail, which elements are required for the test and how changing it would affect the performance. In particular, tooltips were provided for each feature, making it easy to use and understand detailed features. In criterion no. 22, product ‘B’ was very convenient to use compared to other tools as it was able to be tested without installing an external compiler. While the other three also had internal compilers, they often didn’t work properly or had to install separate compilers for each project. In criterion no. 25, both tools ‘B’ and ‘D’ can only select specific modules and required functions and generate the desired information as a report, but they were evaluated as ‘△’ because they have limited areas to choose from.

Evaluation ItemsEvaluation CriterionABCD
User learning accesibilityIn what languages are help and manuals available?EnglishEnglish, KoreanEnglish, KoreanEnglish
Are the help and manuals produced in the same order as the order of use?OO
Do the help and manuals provide descriptions of the detailed features of the test tool?O
Input data understandingHow many formats and methods are supported for input data?2 formats (File & directory)2 formats (Local or remote directory)3 formats (Single & multi file, and project file)2 formats (File & directory)
Understanding progress accessibilityDoes it provide UI/UX so that you can easily understand the progress of tests?OOXO
Suitability for installation environmentWhat operating systems can be supported?Windows, LinuxWindows, Linux, MacWindows, Linux, MacWindows, Linux
Does the user not need to install the right compiler for the build environment?O
Uninstallation accessibilityIs the product installed and removed normally?OOOO
Possibility of generating a reportIn what formats can the report be generated?html, texthtml, csv, excelhtmlxml, html, text
Can the user select a report configuration when generating a report?XX
Property 3. Reliability

Evaluating reliability indicators are the evaluation of the white box testing tool’s self-stability and resilience when a problem occurs. For criterion no. 26, only ‘A’ tool could be tested reliably. The remaining three tools were not tested during the test or were abnormally terminated with an error message, and there were cases in which the tool had to be forcibly terminated due to a freezing phenomenon during the test, so it was evaluated as ‘X’. In the case of criterion no. 27, the ‘A’ tool was evaluated as ‘O’ because the failure did not occur through stable operation, and the ‘B’ tool was evaluated as ‘O’ because it skipped the parts already performed and retested from the point of failure. The other two tools were rated as ‘X’ because they had to start a new test instead of starting when they stopped during the test.

Evaluation ItemsEvaluation CriterionABCD
Operational StabilityWere there any errors during the continuous testing? (3 times tested)OXXX
RecoverabilityIf a failure occurs during the test, is the test conducted from the point of failure?OOXX
Property 4. Maintainability

The maintainability is a property that identifies problems through technical support through suppliers or maintenance companies and is evaluated in terms of managing the target (project) to be tested from the user’s point of view. In the case of criterion no. 28, when an error occurred while using the system, the three tools that received a ‘△’ rating did not provide information on how to take action or measures when an error message or an error of unknown cause occurred, or related documents did not exist. In the case of the ‘A’ tool, a solution was provided for some simple errors, such as providing a solution if *Stub did not exist during the test. In addition, in criterion no. 32 and criterion no. 33, ‘B’ tools can be separated by organization or team by issuing separate accounts, and ‘A’ and ‘D’ tools can be separated by organization or team by installing separate extensions.

Evaluation ItemsEvaluation CriterionABCD
Problem Diagnosis and SupportIf an error occurs, does it provide an error code and solution?O
Does it support Q&A or FAQ?OOXX
Can the problem be quickly resolved through a maintenance company in Korea?OOOO
Are there periodic product updates and feature additions?OOXO
Organization and project managementIs it possible to separate workspace by organization or team?OOOO
Is it possible to set permissions and policies per organization or team?OXXX
Project backup and recovery accessibilityIs it possible to back up working project settings and restore them if needed?OOOO
Conclusion

In this post, we systematically set up evaluation criteria for comparing performance and functionality of different automated white box automated white-box software testing tools. We have qualitatively and quantitatively tested four automated white-box testing tools on the 34 different evaluation criteria we developed, using 10 selected open-source projects. Summary on the evaluation results is shown in Table 3. For each evaluation property, product that have showed the best result is marked in blue.

ABCD
FunctionalityO3404
0030
X1010
UsabilityO2522
3124
X1020
ReliabilityO2100
0000
X0122
maintainabilityO7523
0111
X0143
Table 3. Overall Evaluation Table
 ABCD
O131559
3265
X3285
Table 4. Comprehensive Total Evaluation Table

Note that in comprehensive evaluation (shown in Table 4), more ‘O’ marks does not necessarily mean that it is superior over other tools. In most practical use cases, quantitative aspects such as faster build time, broader code coverage, or faster generation of test cases will be more important in evaluating testing tools. Also note that unlike other tools, tool ‘B’ was able to complete testing complex projects (i.e., nine other projects except ‘mathc’) without much difficulties in configuring compilation environment. In the former posting, it was very interesting to define the evaluation indicators for white box automated testing tools, the importance of white box automated testing tools, and the automatic creation and testing of test cases without user intervention. We envision that if one properly acknowledges the strengths and weaknesses of different automated software testing tools, and utilizes them according to the purpose, one can not only bring improvements to the safety of system, but also to the overall quality of software development process.

강식 신

KAIST 사이버보안연구센터 악성코드분석 팀원으로 악성코드 분석 프로그램 및 연구를 수행하고 있다.

영락 유

KAIST 사이버보안연구센터 악성코드분석팀 연구원으로 블록체인 및 소프트웨어 테스팅 연구를 진행하고 있다.

3 명이 이 글에 공감합니다.

답글 남기기

이메일 주소는 공개되지 않습니다.